Gilial quantizes every embedding in your Pinecone index using TurboQuant, reducing each vector to fewer bits in-place. Every record stays. Each one just takes up less space.
Every deprecated document, every retired embedding model, every test run, they all keep paying rent. Most teams don't measure it until the bill lands.
Gilial sits beside your existing Pinecone client. No new database, no copy step, no downtime. Estimate, quantize, validate.
Point Gilial at any Pinecone index. estimate_savings() is deterministic: it computes the exact compression ratio from index metadata, no vectors fetched.
>>> from gilial import PineconeCompressionClient >>> client = PineconeCompressionClient( api_key="pcsk_...", index_name="prod-embeddings", ) # returns exact ratio, no vectors fetched >>> client.estimate_savings(namespace="docs") vectors 2,400,000 dims 1,536 namespace docs ratio 8.0× (balanced, 4-bit) savings 87.5% of index storage estimate complete ●
Not a research toy. Gilial ships with dry-runs, audit logs, and the bounds your platform team is going to ask for anyway.
Drop-in client wraps the Pinecone SDK. Serverless and pod-based indexes both supported.
F.01Every compress()previews the math first: compression ratio, GB saved, and recall projection. Apply only when you're ready.
F.02Random rotation + scalar quantization + optional QJL residual. Uniform error distribution means recall holds even at aggressive bit depths.
F.03Pass namespace= to target a single namespace, or call compress_all_namespaces() to sweep the entire index in one shot.
F.04Override the strategy default with bits_per_dim= (2–8) for fine-grained control, step between presets without creating a new client.
F.05Pass on_progress(vectors_done, vectors_total) to stream live batch updates during compression, wire it straight to a progress bar or log line.
F.06Pass log_path="auto" to write a JSON audit trail to ~/.gilial/logs/. Includes namespace, bits_per_dim, duration, and recall.
F.07Pass validate_recall=True to run a before/after benchmark automatically. Compression is flagged safe when mean recall is ≥ 95%. Bring your own query vectors for a production-accurate check.
F.08No phone-home, no telemetry, no vendor lock-in. Full source on GitHub.
F.09