Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39822

Vector (MHNSW) index returns wrong results for M ≥ 128: per-layer neighbor count is stored in a single byte and overflows

    XMLWordPrintable

Details

    Description

      Description

      For a vector index created with a large M (specifically M ≥ 128), a ORDER BY VEC_DISTANCE(...) LIMIT k search returns wrong rows — in the extreme case the farthest rows instead of the nearest. The same data and query return correct results with the default M=6 (or any M ≤ 127).

      How to repeat

      CREATE TABLE t (
      id INT PRIMARY KEY,
      v VECTOR(1) NOT NULL,
      VECTOR INDEX (v) DISTANCE=euclidean M=200
      );

      – 1-D vectors [1], [2], ... fill the graph densely
      DELIMITER //
      CREATE PROCEDURE fill_t()
      BEGIN
      DECLARE i INT DEFAULT 1;
      WHILE i <= 10000 DO
      INSERT INTO t VALUES (i, VEC_FromText(CONCAT('[', i, ']')));
      SET i = i + 1;
      END WHILE;
      END //
      DELIMITER ;
      CALL fill_t();
      DROP PROCEDURE fill_t;

      – the 10 nearest to [1] should be ids 1..10
      SELECT id FROM t FORCE INDEX(v)
      ORDER BY VEC_DISTANCE_EUCLIDEAN(v, VEC_FromText('[1]'))
      LIMIT 10;
      – actual: a cluster of far-away ids (e.g. 770..778); expected: 1..10

      Re-running with M=6 (or M=127) returns the correct 1..10.

      Root cause

      The HNSW graph serializes each node's per-layer neighbor list into the neighbors blob as <count><gref><gref>...<count>..., where count is written and read as a single byte:

      • write — FVectorNode::save
        *ptr++= (uchar)(neighbors[i].num);
      • read — FVectorNode::load_from_record
        size_t grefs= *ptr++;
      • blob size
        total_size+= 1 + gref_len() * neighbors[i].num;

      But layer 0 allows up to max_neighbors(0) = 2*M neighbors, and the per-index M option is bound to the default_m sysvar whose maximum is 200. So a node can legitimately accumulate up to 400 neighbors — well beyond the 255 a byte can hold.

      When neighbors[i].num ≥ 256 the count byte wraps:

      • 256 → 0 ⇒ the node is persisted with zero neighbors (isolated in the graph);
      • 400 → 144 ⇒ the neighbor list is truncated, and for multi-layer nodes the following <count><gref>... segments are then parsed from the wrong offset.

      This breaks graph connectivity, so the greedy descent can no longer leave the entry-point region and returns near-entry-point (i.e. far-from-target) candidates. The threshold is 2*M > 255, i.e. M ≥ 128.

      Attachments

        Activity

          People

            serg Sergei Golubchik
            ZihaoW Wang Zihao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.