Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34774

Vector queries crash in subquery or wrong results with WHERE

    XMLWordPrintable

Details

    • Bug
    • Status: In Testing (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 11.6
    • 11.7
    • None
    • Container image "mariadb-devel:11.6-vector-preview"

    Description

      I've been playing with the newly introduced vector search functionality using the provided container image ( https://quay.io/mariadb-foundation/mariadb-devel:11.6-vector-preview ).
      Mostly it works great, however I've encountered an issue where MariaDB reproducibly returns incorrect results when there are specific WHERE, ORDER BY and LIMIT clauses in a query.

      Here is my data initialisation:

      CREATE TABLE IF NOT EXISTS vector_test (
          id SERIAL PRIMARY KEY,
          text TEXT,
          embedding BLOB NOT NULL,
          VECTOR INDEX (embedding)
      );
       
      INSERT INTO vector_test (text, embedding) VALUES
          ('lorem ipsum', vec_fromtext('[1.000000,0.000000,0.000000]')),
          ('dolor sit amet', vec_fromtext('[0.000000,1.000000,0.000000]')),
          ('consectetur adipiscing elit', vec_fromtext('[0.000000,0.000000,1.000000]'));
      

      For this data I run the following query:

      SELECT text, vec_distance(embedding, vec_fromtext('[1.0, 0.0, 0.0]')) AS distance
      FROM vector_test
      WHERE text = 'dolor sit amet'
      ORDER BY distance
      LIMIT 1;
      

      Given the data above, obviously it should return 1 row, however it returns "Empty set"!
      Interestingly, if I remove the LIMIT, the query returns the expected 1 row:

      +----------------+--------------------+
      | text           | distance           |
      +----------------+--------------------+
      | dolor sit amet | 1.4142135623730951 |
      +----------------+--------------------+
      1 row in set (0.001 sec)
      

      Also, if I try a WHERE clause including the top result by distance, the query works correctly even with the limit:

      MariaDB [test]> SELECT text, vec_distance(embedding, vec_fromtext('[1.0, 0.0, 0.0]')) AS distance
          -> FROM vector_test
          -> WHERE text = 'lorem ipsum'
          -> ORDER BY distance
          -> LIMIT 1;
      +-------------+----------+
      | text        | distance |
      +-------------+----------+
      | lorem ipsum |        0 |
      +-------------+----------+
      1 row in set (0.002 sec)
      

      It seems as though MariaDB runs the index search with ORDER BY and LIMIT first, before the WHERE condition.

      To work around this issue, I decided to do the WHERE filtering in a subquery, and then in an outer query do ORDER BY and LIMIT:

      SELECT * FROM (
          SELECT text, vec_distance(embedding, vec_fromtext('[0.0, 1.0, 0.0]')) AS distance
          FROM vector_test
          WHERE text = 'dolor sit amet'
      ) AS unordered
      ORDER BY distance
      LIMIT 1;
      

      Unfortunately this has even worse results, causing the server to crash! You can check the attached file with the full crash output.

      Luckily, this too can be worked around if I add any LIMIT to the subquery, so the following works correctly:

      MariaDB [test]> SELECT * FROM (
          ->     SELECT text, vec_distance(embedding, vec_fromtext('[1.0, 0.0, 0.0]')) AS distance
          ->     FROM vector_test
          ->     WHERE text = 'dolor sit amet'
          ->     LIMIT 1000000
          -> ) AS unordered
          -> ORDER BY distance
          -> LIMIT 1;
      +----------------+--------------------+
      | text           | distance           |
      +----------------+--------------------+
      | dolor sit amet | 1.4142135623730951 |
      +----------------+--------------------+
      1 row in set (0.001 sec)
      

      Attachments

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              kalin Kalin Arsov
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.