Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36205

faster distance calculations via extrapolation

    XMLWordPrintable

Details

    • Hide
      Optimization that makes vector search 30-50% (depending on the data) faster for the same recall. Enabled automatically for applicable vectors. Vectors are applicable if they can be gradually truncated to trade some recall for speed. For example matryoshka embeddings as produced by OpenAI are applicable.
      Show
      Optimization that makes vector search 30-50% (depending on the data) faster for the same recall. Enabled automatically for applicable vectors. Vectors are applicable if they can be gradually truncated to trade some recall for speed. For example matryoshka embeddings as produced by OpenAI are applicable.
    • Q2/2025 Development

    Description

      continue the work done with Eigen, but

      • remove randomization (so, Eigen won't be needed)
      • only use this optimization when dimensions already prepared for it
        • e.g randomized in the client (Matryoshka?)
      • automatically detect if this is the case:
        • on inserts and on disk reads calculate the truncated and full distance, see if prediction works
        • may be only of the first X% of rows (when reading into the cache)

      Attachments

        1. stats.ods
          40 kB
        2. vec1.test
          7.33 MB
        3. vec2.test
          278 kB

        Activity

          People

            serg Sergei Golubchik
            serg Sergei Golubchik
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.