XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 11.8
Fix Version/s: 11.8, 12.3
Component/s: Vector search
Labels:
None

Description

For a vector index created with a large M (specifically M ≥ 128), a ORDER BY VEC_DISTANCE(...) LIMIT k search returns wrong rows — in the extreme case the farthest rows instead of the nearest. The same data and query return correct results with the default M=6 (or any M ≤ 127).

How to repeat

CREATE TABLE t (
id INT PRIMARY KEY,
v VECTOR(1) NOT NULL,
VECTOR INDEX (v) DISTANCE=euclidean M=200
);

– 1-D vectors [1], [2], ... fill the graph densely
DELIMITER //
CREATE PROCEDURE fill_t()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i <= 10000 DO
INSERT INTO t VALUES (i, VEC_FromText(CONCAT('[', i, ']')));
SET i = i + 1;
END WHILE;
END //
DELIMITER ;
CALL fill_t();
DROP PROCEDURE fill_t;

– the 10 nearest to [1] should be ids 1..10
SELECT id FROM t FORCE INDEX(v)
ORDER BY VEC_DISTANCE_EUCLIDEAN(v, VEC_FromText('[1]'))
LIMIT 10;
– actual: a cluster of far-away ids (e.g. 770..778); expected: 1..10

Re-running with M=6 (or M=127) returns the correct 1..10.

Root cause

The HNSW graph serializes each node's per-layer neighbor list into the neighbors blob as <count><gref><gref>...<count>..., where count is written and read as a single byte:

write — FVectorNode::save
*ptr++= (uchar)(neighbors[i].num);
read — FVectorNode::load_from_record
size_t grefs= *ptr++;
blob size
total_size+= 1 + gref_len() * neighbors[i].num;

But layer 0 allows up to max_neighbors(0) = 2*M neighbors, and the per-index M option is bound to the default_m sysvar whose maximum is 200. So a node can legitimately accumulate up to 400 neighbors — well beyond the 255 a byte can hold.

When neighbors[i].num ≥ 256 the count byte wraps:

256 → 0 ⇒ the node is persisted with zero neighbors (isolated in the graph);
400 → 144 ⇒ the neighbor list is truncated, and for multi-layer nodes the following <count><gref>... segments are then parsed from the wrong offset.

This breaks graph connectivity, so the greedy descent can no longer leave the entry-point region and returns near-entry-point (i.e. far-from-target) candidates. The threshold is 2*M > 255, i.e. M ≥ 128.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

0001-MDEV-39822-vector-index-returns-wrong-results-for-la.patch
2026-06-02 11:08
8 kB
Wang Zihao

Activity

People

Assignee:: Sergei Golubchik

Reporter:: Wang Zihao

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2026-06-02 10:56

Updated:: 2026-06-02 11:28

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.