Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35246

Vector search skips a row in the table

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Not a Bug
    • N/A
    • N/A
    • Vector search
    • None

    Description

      The issue was observed in manual testing.

      By the beginning of events, the server wasn't completely fresh, but the table in question was created, dropped and re-created several times without problems, and to my knowledge there were no visible old artifacts related to the table.

      An InnoDB table with VECTOR(1) field and vector key was created and populated with 100 rows, random values in the vector column. The table also had table-level with system versioning and column-level without system versioning for the vector column, I'm not sure whether it's important.

      Then some operations were performed, on this table as well as on other ones; importantly 80 out of 100 rows were deleted from the table, and for the rest the vector column was updated to a single constant value.

      After that, the table scan continued returning 20 rows as it should (or 100 rows with system_time all), but vector search with limit 30 started returning only 19 rows.

      Once it started happening, it became persistent – after restarting the server, or using the same datadir on another server, the result set is still wrong.

      I have the unabridged record of console operations and output (attached as console.txt). It is quite noisy and contains a lot of irrelevant information and user errors usual for manual operations, but as far as I can tell nothing unexpected unless the wrong result set started happening. The described events start from the line 840. The wrong result occurs for the first time at line 1380.

      However, I couldn't reproduce the issue by repeating all the same operations in the same order.

      I don't have rand seeds for inserting data, but the inserted data was printed in the output, so while trying to reproduce, I converted it into INSERT .. VALUES instead of using RAND(). It didn't help.

      All in all, it seems that something random may have happened internally, maybe during updating the index. I am attaching the datadir in case it can be analyzed based on already existing data. It was created with 7fce19bd215ac0671855044520092aa4210049d1 Just unpack and run the server on it, all default options are sufficient.

      Here is the erroneous result:

      MariaDB [test]> select a, vec_totext(v) from t  order by vec_distance_euclidean(v,0x31313131) limit 30;
      +------+---------------+
      | a    | vec_totext(v) |
      +------+---------------+
      |    1 | [2.57849e-9]  |
      |   20 | [2.57849e-9]  |
      |   11 | [2.57849e-9]  |
      |    6 | [2.57849e-9]  |
      |    7 | [2.57849e-9]  |
      |    3 | [2.57849e-9]  |
      |    5 | [2.57849e-9]  |
      |    8 | [2.57849e-9]  |
      |    9 | [2.57849e-9]  |
      |    2 | [2.57849e-9]  |
      |    4 | [2.57849e-9]  |
      |   12 | [2.57849e-9]  |
      |   19 | [2.57849e-9]  |
      |   18 | [2.57849e-9]  |
      |   17 | [2.57849e-9]  |
      |   15 | [2.57849e-9]  |
      |   14 | [2.57849e-9]  |
      |   13 | [2.57849e-9]  |
      |   10 | [2.57849e-9]  |
      +------+---------------+
      19 rows in set (0.002 sec)
      

      comparing to

      MariaDB [test]> select a, vec_totext(v) from t  order by vec_distance_euclidean(v,0x31313131) ;
      +------+---------------+
      | a    | vec_totext(v) |
      +------+---------------+
      |    1 | [2.57849e-9]  |
      |   12 | [2.57849e-9]  |
      |   13 | [2.57849e-9]  |
      |   14 | [2.57849e-9]  |
      |   15 | [2.57849e-9]  |
      |   16 | [2.57849e-9]  |
      |   17 | [2.57849e-9]  |
      |   18 | [2.57849e-9]  |
      |   19 | [2.57849e-9]  |
      |   11 | [2.57849e-9]  |
      |   10 | [2.57849e-9]  |
      |    2 | [2.57849e-9]  |
      |    3 | [2.57849e-9]  |
      |    4 | [2.57849e-9]  |
      |    5 | [2.57849e-9]  |
      |    6 | [2.57849e-9]  |
      |    7 | [2.57849e-9]  |
      |    8 | [2.57849e-9]  |
      |    9 | [2.57849e-9]  |
      |   20 | [2.57849e-9]  |
      +------+---------------+
      20 rows in set (0.003 sec)
      

      If it can't be analyzed this way, please feel free to close.

      Attachments

        1. console.txt
          74 kB
        2. data.tar.gz
          3.22 MB

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.