OPEN SOURCE · PINECONE COMPRESSION · MIT LICENSEVECTOR DATABASE OPTIMIZATION · PYTHON

Compress
your bloated
vector index.

Gilial quantizes every embedding in your Pinecone index using TurboQuant, reducing each vector to fewer bits in-place. Every record stays. Each one just takes up less space.

STATUSoperationalLICENSEMIT / Open SourceBALANCED (4-BIT)~8× compression
storage_comparison.dat
100k vectors · 1536-d · 0.6 GB baseline
float32original0.57 GB
6-bithigh quality0.11 GB5×
4-bitbalanced0.07 GB8×
2-bitaggressive0.04 GB16×
strategy ─ balancedbits per dim ─ 4
Index
100k
0.6 GB · 1536-d
Ratio
8.0×
float32 → 4-bit
Saved
0.50GB
≈ $121/mo on Pinecone
fig.01: vector magnitude distribution · TurboQuant
00 / problem

A vector index forgets nothing.

Every deprecated document, every retired embedding model, every test run, they all keep paying rent. Most teams don't measure it until the bill lands.

01 / how it works

Three calls, no migration.

Gilial sits beside your existing Pinecone client. No new database, no copy step, no downtime. Estimate, quantize, validate.

gilial · phase_01~/index.cmp
Connect. Sample.

Point Gilial at any Pinecone index. estimate_savings() is deterministic: it computes the exact compression ratio from index metadata, no vectors fetched.

>>> from gilial import PineconeCompressionClient

>>> client = PineconeCompressionClient(
        api_key="pcsk_...",
        index_name="prod-embeddings",
    )

# returns exact ratio, no vectors fetched
>>> client.estimate_savings(namespace="docs")
  vectors   2,400,000
  dims      1,536
  namespace docs
  ratio     8.0×  (balanced, 4-bit)
  savings   87.5%  of index storage
  estimate complete  
02 / capabilities

Built for production indexes.

Not a research toy. Gilial ships with dry-runs, audit logs, and the bounds your platform team is going to ask for anyway.

Pinecone native

Drop-in client wraps the Pinecone SDK. Serverless and pod-based indexes both supported.

F.01

Dry-run by default

Every compress()previews the math first: compression ratio, GB saved, and recall projection. Apply only when you're ready.

F.02

TurboQuant algorithm

Random rotation + scalar quantization + optional QJL residual. Uniform error distribution means recall holds even at aggressive bit depths.

F.03

Namespace support

Pass namespace= to target a single namespace, or call compress_all_namespaces() to sweep the entire index in one shot.

F.04

Custom bits per dimension

Override the strategy default with bits_per_dim= (2–8) for fine-grained control, step between presets without creating a new client.

F.05

Progress callbacks

Pass on_progress(vectors_done, vectors_total) to stream live batch updates during compression, wire it straight to a progress bar or log line.

F.06

Compression log

Pass log_path="auto" to write a JSON audit trail to ~/.gilial/logs/. Includes namespace, bits_per_dim, duration, and recall.

F.07

Recall validation

Pass validate_recall=True to run a before/after benchmark automatically. Compression is flagged safe when mean recall is ≥ 95%. Bring your own query vectors for a production-accurate check.

F.08

Self-hostable. MIT.

No phone-home, no telemetry, no vendor lock-in. Full source on GitHub.

F.09