OPEN SOURCE · PINECONE COMPRESSION · MIT LICENSEVECTOR DATABASE OPTIMIZATION · PYTHON

Compress
your bloated
vector index.

Gilial quantizes every embedding in your Pinecone index using TurboQuant, reducing each vector to fewer bits in-place. Every record stays. Each one just takes up less space.

STATUSoperationalLICENSEMIT / Open SourceBALANCED (4-BIT)~8× compression

Read the docs→View on GitHub

storage_comparison.dat

100k vectors · 1536-d · 0.6 GB baseline

float32original0.57 GB

6-bithigh quality0.11 GB5×

4-bitbalanced0.07 GB8×

2-bitaggressive0.04 GB16×

strategy ─ balancedbits per dim ─ 4

Index

100k

0.6 GB · 1536-d

Ratio

8.0×

float32 → 4-bit

Saved

0.50GB

≈ $121/mo on Pinecone

fig.01: vector magnitude distribution · TurboQuant

00 / problem

A vector index forgets nothing.

Every deprecated document, every retired embedding model, every test run, they all keep paying rent. Most teams don't measure it until the bill lands.

01 / how it works

Three calls, no migration.

Gilial sits beside your existing Pinecone client. No new database, no copy step, no downtime. Estimate, quantize, validate.

gilial · phase_01~/index.cmp

Connect. Sample.

Point Gilial at any Pinecone index. estimate_savings() is deterministic: it computes the exact compression ratio from index metadata, no vectors fetched.

>>> from gilial import PineconeCompressionClient

>>> client = PineconeCompressionClient(
        api_key="pcsk_...",
        index_name="prod-embeddings",
    )

# returns exact ratio, no vectors fetched
>>> client.estimate_savings(namespace="docs")
  vectors   2,400,000
  dims      1,536
  namespace docs
  ratio     8.0×  (balanced, 4-bit)
  savings   87.5%  of index storage
  estimate complete  ●

02 / capabilities

Built for production indexes.

Not a research toy. Gilial ships with dry-runs, audit logs, and the bounds your platform team is going to ask for anyway.

◐

Pinecone native

Drop-in client wraps the Pinecone SDK. Serverless and pod-based indexes both supported.

F.01

◇

Dry-run by default

Every compress()previews the math first: compression ratio, GB saved, and recall projection. Apply only when you're ready.

F.02

∿

TurboQuant algorithm

Random rotation + scalar quantization + optional QJL residual. Uniform error distribution means recall holds even at aggressive bit depths.

F.03

⌖

Namespace support

Pass namespace= to target a single namespace, or call compress_all_namespaces() to sweep the entire index in one shot.

F.04

⊞

Custom bits per dimension

Override the strategy default with bits_per_dim= (2–8) for fine-grained control, step between presets without creating a new client.

F.05

↓

Progress callbacks

Pass on_progress(vectors_done, vectors_total) to stream live batch updates during compression, wire it straight to a progress bar or log line.

F.06

▤

Compression log

Pass log_path="auto" to write a JSON audit trail to ~/.gilial/logs/. Includes namespace, bits_per_dim, duration, and recall.

F.07

◎

Recall validation

Pass validate_recall=True to run a before/after benchmark automatically. Compression is flagged safe when mean recall is ≥ 95%. Bring your own query vectors for a production-accurate check.

F.08

Self-hostable. MIT.

No phone-home, no telemetry, no vendor lock-in. Full source on GitHub.

F.09

Compress
your bloated
vector index.

A vector index forgets nothing.

Three calls, no migration.

Estimate savings

Rotate & quantize

Apply. Validate.

Built for production indexes.

Pinecone native

Dry-run by default

TurboQuant algorithm

Namespace support

Custom bits per dimension

Progress callbacks

Compression log

Recall validation

Self-hostable. MIT.

Compressyour bloatedvector index.

A vector index forgets nothing.

Three calls, no migration.

Estimate savings

Rotate & quantize

Apply. Validate.

Built for production indexes.

Pinecone native

Dry-run by default

TurboQuant algorithm

Namespace support

Custom bits per dimension

Progress callbacks

Compression log

Recall validation

Self-hostable. MIT.

Compress
your bloated
vector index.