Details
-
Epic
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
Pluggable quantization for VECTOR INDEX (TurboQuant)
Description
Background
MariaDB Vector stores MHNSW index entries at int16 precision. There is currently no way to reduce per-vector memory footprint via quantization. For large vector workloads (millions of embeddings at 768–1536 dimensions), memory consumption is a significant cost and scalability constraint.
TurboQuant (Google Research, 2025) is a data-oblivious vector quantization algorithm: it applies a randomized Hadamard rotation followed by scalar quantization, then corrects inner-product bias with a 1-bit QJL transform. It requires no training data, no codebook learning, and near-zero preprocessing time — making it well suited for online index builds. At 4-bit precision it achieves roughly 8× compression versus float32 with recall degradation typically within 1–3% of uncompressed search.
Proposed Changes
1. Pluggable quantization API in MHNSW
Add an internal API that allows quantization methods to be registered and selected per-index. The API handles: vector encoding/decoding, quantized distance computation, and metadata exposure via information_schema and SHOW INDEX. Designed to be method-agnostic so future quantization algorithms can be added without changes to the core MHNSW graph traversal.
2. TurboQuant implementation
Implement TurboQuant at configurable bit-widths (1-bit, 2-bit, 4-bit) for cosine, Euclidean, and dot-product distance metrics. Includes Hadamard rotation, scalar quantization, QJL bias correction, and length renormalization (per RaBitQ). SIMD kernels for AVX2/AVX512/ARM/POWER10, with scalar fallback.
References
- TurboQuant paper: https://arxiv.org/abs/2504.19874
- Google Research blog: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
- RaBitQ (renormalization technique): https://arxiv.org/abs/2405.12497