Show HN: TurboQuant for vector search – 2-4 bit compression | TurboQuant: Giải pháp nén vector 2-4 bit đột phá cho tìm kiếm tương đồng | Now Let Us

TurboQuant is a Rust-based vector search tool that compresses high-dimensional vectors to 2-4 bits without training, offering near-optimal distortion and high performance on ARM and x86 architectures.

Rust implementation of TurboQuant for vector search, with Python bindings via PyO3.

Compresses high-dimensional vectors to 2-4 bits per coordinate with near-optimal distortion. Data-oblivious (no training), zero indexing time.

Unofficial implementation of TurboQuant (Google Research, ICLR 2026).

from turbovec import TurboQuantIndex
index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
index.add(more_vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tq")
loaded = TurboQuantIndex.load("my_index.tq")

use turbovec::TurboQuantIndex;
let mut index = TurboQuantIndex::new(1536, 4);
index.add(&vectors);
let results = index.search(&queries, 10);
index.write("index.tv").unwrap();
let loaded = TurboQuantIndex.load("index.tv").unwrap();

TurboQuant vs FAISS IndexPQFastScan on OpenAI DBpedia d=1536 (100K vectors, 1K queries, k=64). FAISS PQ configurations sized to match TurboQuant compression ratios.

TurboQuant requires zero training. FAISS PQ needs a training step (4-10 seconds). TurboQuant index build is 3-4x faster.

| TQ speed | FAISS speed | Ratio | TQ recall@1 | FAISS recall@1 | |---|---|---|---|---| | 2-bit MT | 0.125ms/q | 0.128ms/q | 0.97x | 0.870 | 0.882 | | 2-bit ST | 1.272ms/q | 1.247ms/q | 1.02x | 0.870 | 0.882 | | 4-bit MT | 0.232ms/q | 0.246ms/q | 0.94x | 0.955 | 0.930 | | 4-bit ST | 2.474ms/q | 2.485ms/q | 1.00x | 0.955 | 0.930 |

On ARM, TurboQuant matches or beats FAISS on speed while requiring no training step. At 4-bit, TurboQuant recall is higher than FAISS (0.955 vs 0.930).

On x86, TurboQuant is within 18-25% of FAISS on speed. At 4-bit, TurboQuant recall is higher than FAISS (0.955 vs 0.930). The speed gap is primarily from TurboQuant's rotation step (~5% of total time) and differences in AVX2 code generation vs FAISS's template-instantiated C++ kernels.

| Bit width | Index size (100K x 1536) | Compression vs FP32 | |---|---|---| | 2-bit | 37.0 MB | 15.8x | | 4-bit | 73.6 MB | 8.0x |

Each vector is a direction on a high-dimensional hypersphere. TurboQuant compresses these directions using a simple insight: after applying a random rotation, every coordinate follows a known distribution -- regardless of the input data.

1. Normalize. Strip the length (norm) from each vector and store it as a single float. Now every vector is a unit direction on the hypersphere.

2. Random rotation. Multiply all vectors by the same random orthogonal matrix. After rotation, each coordinate independently follows a Beta distribution that converges to Gaussian N(0, 1/d) in high dimensions. This holds for any input data -- the rotation makes the coordinate distribution predictable.

3. Lloyd-Max scalar quantization. Since the distribution is known, we can precompute the optimal way to bucket each coordinate. For 2-bit, that's 4 buckets; for 4-bit, 16 buckets. The Lloyd-Max algorithm finds bucket boundaries and centroids that minimize mean squared error. These are computed once from the math, not from the data.

4. Bit-pack. Each coordinate is now a small integer (0-3 for 2-bit, 0-15 for 4-bit). Pack these tightly into bytes. A 1536-dim vector goes from 6,144 bytes (FP32) to 384 bytes (2-bit). That's 16x compression.

Search. Instead of decompressing every database vector, we rotate the query once into the same domain and score directly against the codebook values. The scoring kernel uses SIMD intrinsics (NEON on ARM, AVX2 on x86) with nibble-split lookup tables for maximum throughput.

Online by design. Because the codebook and rotation are derived from math (not from the data), new vectors can be added at any time without rebuilding the index. Traditional methods like Product Quantization require expensive offline codebook training that must be re-run when data changes.

Source: Hacker News