Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
N/A
-
None
Description
The issue was observed in manual testing.
By the beginning of events, the server wasn't completely fresh, but the table in question was created, dropped and re-created several times without problems, and to my knowledge there were no visible old artifacts related to the table.
An InnoDB table with VECTOR(1) field and vector key was created and populated with 100 rows, random values in the vector column. The table also had table-level with system versioning and column-level without system versioning for the vector column, I'm not sure whether it's important.
Then some operations were performed, on this table as well as on other ones; importantly 80 out of 100 rows were deleted from the table, and for the rest the vector column was updated to a single constant value.
After that, the table scan continued returning 20 rows as it should (or 100 rows with system_time all), but vector search with limit 30 started returning only 19 rows.
Once it started happening, it became persistent – after restarting the server, or using the same datadir on another server, the result set is still wrong.
I have the unabridged record of console operations and output (attached as console.txt). It is quite noisy and contains a lot of irrelevant information and user errors usual for manual operations, but as far as I can tell nothing unexpected unless the wrong result set started happening. The described events start from the line 840. The wrong result occurs for the first time at line 1380.
However, I couldn't reproduce the issue by repeating all the same operations in the same order.
I don't have rand seeds for inserting data, but the inserted data was printed in the output, so while trying to reproduce, I converted it into INSERT .. VALUES instead of using RAND(). It didn't help.
All in all, it seems that something random may have happened internally, maybe during updating the index. I am attaching the datadir in case it can be analyzed based on already existing data. It was created with 7fce19bd215ac0671855044520092aa4210049d1 Just unpack and run the server on it, all default options are sufficient.
Here is the erroneous result:
MariaDB [test]> select a, vec_totext(v) from t order by vec_distance_euclidean(v,0x31313131) limit 30; |
+------+---------------+ |
| a | vec_totext(v) |
|
+------+---------------+ |
| 1 | [2.57849e-9] |
|
| 20 | [2.57849e-9] |
|
| 11 | [2.57849e-9] |
|
| 6 | [2.57849e-9] |
|
| 7 | [2.57849e-9] |
|
| 3 | [2.57849e-9] |
|
| 5 | [2.57849e-9] |
|
| 8 | [2.57849e-9] |
|
| 9 | [2.57849e-9] |
|
| 2 | [2.57849e-9] |
|
| 4 | [2.57849e-9] |
|
| 12 | [2.57849e-9] |
|
| 19 | [2.57849e-9] |
|
| 18 | [2.57849e-9] |
|
| 17 | [2.57849e-9] |
|
| 15 | [2.57849e-9] |
|
| 14 | [2.57849e-9] |
|
| 13 | [2.57849e-9] |
|
| 10 | [2.57849e-9] |
|
+------+---------------+ |
19 rows in set (0.002 sec) |
comparing to
MariaDB [test]> select a, vec_totext(v) from t order by vec_distance_euclidean(v,0x31313131) ; |
+------+---------------+ |
| a | vec_totext(v) |
|
+------+---------------+ |
| 1 | [2.57849e-9] |
|
| 12 | [2.57849e-9] |
|
| 13 | [2.57849e-9] |
|
| 14 | [2.57849e-9] |
|
| 15 | [2.57849e-9] |
|
| 16 | [2.57849e-9] |
|
| 17 | [2.57849e-9] |
|
| 18 | [2.57849e-9] |
|
| 19 | [2.57849e-9] |
|
| 11 | [2.57849e-9] |
|
| 10 | [2.57849e-9] |
|
| 2 | [2.57849e-9] |
|
| 3 | [2.57849e-9] |
|
| 4 | [2.57849e-9] |
|
| 5 | [2.57849e-9] |
|
| 6 | [2.57849e-9] |
|
| 7 | [2.57849e-9] |
|
| 8 | [2.57849e-9] |
|
| 9 | [2.57849e-9] |
|
| 20 | [2.57849e-9] |
|
+------+---------------+ |
20 rows in set (0.003 sec) |
If it can't be analyzed this way, please feel free to close.
Attachments
Issue Links
- is caused by
-
MDEV-34939 vector search in 11.7
- Closed
- relates to
-
MDEV-8306 Complete cost-based optimization for ORDER BY with LIMIT
- Stalled