NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...2 min read

Show HN: TurboQuant for vector search – 2-4 bit compression

Share
NOW LET US Article – Show HN: TurboQuant for vector search – 2-4 bit compression

TurboQuant is a Rust-based vector search tool that compresses high-dimensional vectors to 2-4 bits without training, offering near-optimal distortion and high performance on ARM and x86 architectures.

Rust implementation of TurboQuant for vector search, with Python bindings via PyO3.

Compresses high-dimensional vectors to 2-4 bits per coordinate with near-optimal distortion. Data-oblivious (no training), zero indexing time.

Unofficial implementation of TurboQuant (Google Research, ICLR 2026).

from turbovec import TurboQuantIndex
index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
index.add(more_vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tq")
loaded = TurboQuantIndex.load("my_index.tq")
use turbovec::TurboQuantIndex;
let mut index = TurboQuantIndex::new(1536, 4);
index.add(&vectors);
let results = index.search(&queries, 10);
index.write("index.tv").unwrap();
let loaded = TurboQuantIndex.load("index.tv").unwrap();

TurboQuant vs FAISS IndexPQFastScan on OpenAI DBpedia d=1536 (100K vectors, 1K queries, k=64). FAISS PQ configurations sized to match TurboQuant compression ratios.

TurboQuant requires zero training. FAISS PQ needs a training step (4-10 seconds). TurboQuant index build is 3-4x faster.

| TQ speed | FAISS speed | Ratio | TQ recall@1 | FAISS recall@1 | |---|---|---|---|---| | 2-bit MT | 0.125ms/q | 0.128ms/q | 0.97x | 0.870 | 0.882 | | 2-bit ST | 1.272ms/q | 1.247ms/q | 1.02x | 0.870 | 0.882 | | 4-bit MT | 0.232ms/q | 0.246ms/q | 0.94x | 0.955 | 0.930 | | 4-bit ST | 2.474ms/q | 2.485ms/q | 1.00x | 0.955 | 0.930 |

On ARM, TurboQuant matches or beats FAISS on speed while requiring no training step. At 4-bit, TurboQuant recall is higher than FAISS (0.955 vs 0.930).

On x86, TurboQuant is within 18-25% of FAISS on speed. At 4-bit, TurboQuant recall is higher than FAISS (0.955 vs 0.930). The speed gap is primarily from TurboQuant's rotation step (~5% of total time) and differences in AVX2 code generation vs FAISS's template-instantiated C++ kernels.

| Bit width | Index size (100K x 1536) | Compression vs FP32 | |---|---|---| | 2-bit | 37.0 MB | 15.8x | | 4-bit | 73.6 MB | 8.0x |

Each vector is a direction on a high-dimensional hypersphere. TurboQuant compresses these directions using a simple insight: after applying a random rotation, every coordinate follows a known distribution -- regardless of the input data.

1. Normalize. Strip the length (norm) from each vector and store it as a single float. Now every vector is a unit direction on the hypersphere.

2. Random rotation. Multiply all vectors by the same random orthogonal matrix. After rotation, each coordinate independently follows a Beta distribution that converges to Gaussian N(0, 1/d) in high dimensions. This holds for any input data -- the rotation makes the coordinate distribution predictable.

3. Lloyd-Max scalar quantization. Since the distribution is known, we can precompute the optimal way to bucket each coordinate. For 2-bit, that's 4 buckets; for 4-bit, 16 buckets. The Lloyd-Max algorithm finds bucket boundaries and centroids that minimize mean squared error. These are computed once from the math, not from the data.

4. Bit-pack. Each coordinate is now a small integer (0-3 for 2-bit, 0-15 for 4-bit). Pack these tightly into bytes. A 1536-dim vector goes from 6,144 bytes (FP32) to 384 bytes (2-bit). That's 16x compression.

Search. Instead of decompressing every database vector, we rotate the query once into the same domain and score directly against the codebook values. The scoring kernel uses SIMD intrinsics (NEON on ARM, AVX2 on x86) with nibble-split lookup tables for maximum throughput.

Online by design. Because the codebook and rotation are derived from math (not from the data), new vectors can be added at any time without rebuilding the index. Traditional methods like Product Quantization require expensive offline codebook training that must be re-run when data changes.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Swift at Apple: Migrating the TrueType hinting interpreter

dev-tools

Swift at Apple: Migrating the TrueType hinting interpreter

Apple has rewritten its TrueType hinting interpreter from C to memory-safe Swift for its Fall 2025 OS releases, improving security and boosting performance by an average of 13%.

NOW LET US Related – Where Did Earth Get Its Oceans? Maybe It Made Them Itself

dev-tools

Where Did Earth Get Its Oceans? Maybe It Made Them Itself

For decades, scientists believed Earth's water was delivered by comets or asteroids. However, new research and space missions suggest our planet might have manufactured its own oceans through a mix of magma and hydrogen.

NOW LET US Related – Digital Sovereignty Becomes an Imperative as the US Reads Dutch Emails

dev-tools

Digital Sovereignty Becomes an Imperative as the US Reads Dutch Emails

The reported access of Dutch officials' emails by the U.S. House of Representatives highlights the critical difference between data residency and true digital sovereignty. It underscores why nations must secure legal and operational control over their data, moving beyond mere local storage promises.

NOW LET US Related – Removing 'um' from a recording is harder than it sounds

dev-tools

Removing 'um' from a recording is harder than it sounds

Removing filler words like 'um' and 'uh' from audio recordings is surprisingly difficult due to audio artifacts and AI limitations. The open-source tool 'erm' solves this by combining Whisper with advanced digital signal processing techniques.

NOW LET US Related – If you are asking for human attention, demonstrate human effort

dev-tools

If you are asking for human attention, demonstrate human effort

As AI-generated content floods the workplace, a new etiquette dilemma emerges. This article highlights a crucial principle for modern collaboration: if you want to request human attention, you must first demonstrate human effort.

NOW LET US Related – Raspberry Pi 5 – 16GB RAM

dev-tools

Raspberry Pi 5 – 16GB RAM

The Raspberry Pi 5 features a massive upgrade with a 2.4GHz quad-core processor, up to 16GB of RAM, and in-house silicon for vastly improved I/O performance.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.