Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39858

Reloading COSINE metric index from disk degrades search recall due to abs2 quantization noise

    XMLWordPrintable

Details

    Description

      When a vector is created in-memory using FVector::create() during normal inserts, its squared magnitude (abs2) under the COSINE metric is explicitly normalized and hardcoded to the target constant value of 0.5f.

      However, when the index is reloaded from disk (after a server restart, FLUSH TABLES, or ALTER TABLE), the index uses FVectorNode::load_from_record(). This method reads the stored scale and quantized int16 coordinates from the database record, and runs postprocess(). Inside postprocess(), abs2 is recomputed using floating-point math: abs2 = subabs2 + scale * scale * dot_product(d, d, vec_len) / 2;

      Because the coordinates stored on disk are quantized int16 values, this recalculation introduces minor rounding noise, resulting in reloaded vectors having abs2 values like 0.49987 or 0.50013 instead of the exact 0.5f constant.

      This affects high dimensions datasets, and it is increasing as M increases.

      Attachments

        Activity

          People

            gkodinov Georgi Kodinov
            Ahmad_sh Ahmad Shaban
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.