Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.8
-
None
Description
Description
For a vector index created with a large M (specifically M ≥ 128), a ORDER BY VEC_DISTANCE(...) LIMIT k search returns wrong rows — in the extreme case the farthest rows instead of the nearest. The same data and query return correct results with the default M=6 (or any M ≤ 127).
How to repeat
CREATE TABLE t (
id INT PRIMARY KEY,
v VECTOR(1) NOT NULL,
VECTOR INDEX (v) DISTANCE=euclidean M=200
);
– 1-D vectors [1], [2], ... fill the graph densely
DELIMITER //
CREATE PROCEDURE fill_t()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i <= 10000 DO
INSERT INTO t VALUES (i, VEC_FromText(CONCAT('[', i, ']')));
SET i = i + 1;
END WHILE;
END //
DELIMITER ;
CALL fill_t();
DROP PROCEDURE fill_t;
– the 10 nearest to [1] should be ids 1..10
SELECT id FROM t FORCE INDEX(v)
ORDER BY VEC_DISTANCE_EUCLIDEAN(v, VEC_FromText('[1]'))
LIMIT 10;
– actual: a cluster of far-away ids (e.g. 770..778); expected: 1..10
Re-running with M=6 (or M=127) returns the correct 1..10.
Root cause
The HNSW graph serializes each node's per-layer neighbor list into the neighbors blob as <count><gref><gref>...<count>..., where count is written and read as a single byte:
- write — FVectorNode::save
*ptr++= (uchar)(neighbors[i].num); - read — FVectorNode::load_from_record
size_t grefs= *ptr++; - blob size
total_size+= 1 + gref_len() * neighbors[i].num;
But layer 0 allows up to max_neighbors(0) = 2*M neighbors, and the per-index M option is bound to the default_m sysvar whose maximum is 200. So a node can legitimately accumulate up to 400 neighbors — well beyond the 255 a byte can hold.
When neighbors[i].num ≥ 256 the count byte wraps:
- 256 → 0 ⇒ the node is persisted with zero neighbors (isolated in the graph);
- 400 → 144 ⇒ the neighbor list is truncated, and for multi-layer nodes the following <count><gref>... segments are then parsed from the wrong offset.
This breaks graph connectivity, so the greedy descent can no longer leave the entry-point region and returns near-entry-point (i.e. far-from-target) candidates. The threshold is 2*M > 255, i.e. M ≥ 128.