[MDEV-39626] Pluggable quantization framework for VECTOR INDEX with TurboQuant as first method Description - Jira

XML

Word

Printable

Details

Type: Epic
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: ROADMAP
Component/s: Vector search
Labels:
- vector

Epic Name:
Pluggable quantization for VECTOR INDEX (TurboQuant)
PM Planning:

Description

Background
MariaDB Vector stores MHNSW index entries at int16 precision. There is currently no way to reduce per-vector memory footprint via quantization. For large vector workloads (millions of embeddings at 768–1536 dimensions), memory consumption is a significant cost and scalability constraint.

TurboQuant (Google Research, 2025) is a data-oblivious vector quantization algorithm: it applies a randomized Hadamard rotation followed by scalar quantization, then corrects inner-product bias with a 1-bit QJL transform. It requires no training data, no codebook learning, and near-zero preprocessing time — making it well suited for online index builds. At 4-bit precision it achieves roughly 8× compression versus float32 with recall degradation typically within 1–3% of uncompressed search.

Proposed Changes
1. Pluggable quantization API in MHNSW
Add an internal API that allows quantization methods to be registered and selected per-index. The API handles: vector encoding/decoding, quantized distance computation, and metadata exposure via information_schema and SHOW INDEX. Designed to be method-agnostic so future quantization algorithms can be added without changes to the core MHNSW graph traversal.

2. TurboQuant implementation
Implement TurboQuant at configurable bit-widths (1-bit, 2-bit, 4-bit) for cosine, Euclidean, and dot-product distance metrics. Includes Hadamard rotation, scalar quantization, QJL bias correction, and length renormalization (per RaBitQ). SIMD kernels for AVX2/AVX512/ARM/POWER10, with scalar fallback.

References

TurboQuant paper: https://arxiv.org/abs/2504.19874
Google Research blog: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
RaBitQ (renormalization technique): https://arxiv.org/abs/2405.12497

Attachments

Activity

People

Assignee:: Sergei Golubchik

Reporter:: Adam Luciano

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2026-05-15 15:19

Updated:: 2026-06-03 16:08

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.