Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
11.6
-
Container image "mariadb-devel:11.6-vector-preview"
Description
I've been playing with the newly introduced vector search functionality using the provided container image ( https://quay.io/mariadb-foundation/mariadb-devel:11.6-vector-preview ).
Mostly it works great, however I've encountered an issue where MariaDB reproducibly returns incorrect results when there are specific WHERE, ORDER BY and LIMIT clauses in a query.
Here is my data initialisation:
CREATE TABLE IF NOT EXISTS vector_test (
|
id SERIAL PRIMARY KEY,
|
text TEXT,
|
embedding BLOB NOT NULL,
|
VECTOR INDEX (embedding)
|
);
|
 |
INSERT INTO vector_test (text, embedding) VALUES
|
('lorem ipsum', vec_fromtext('[1.000000,0.000000,0.000000]')),
|
('dolor sit amet', vec_fromtext('[0.000000,1.000000,0.000000]')),
|
('consectetur adipiscing elit', vec_fromtext('[0.000000,0.000000,1.000000]'));
|
For this data I run the following query:
SELECT text, vec_distance(embedding, vec_fromtext('[1.0, 0.0, 0.0]')) AS distance
|
FROM vector_test
|
WHERE text = 'dolor sit amet'
|
ORDER BY distance
|
LIMIT 1;
|
Given the data above, obviously it should return 1 row, however it returns "Empty set"!
Interestingly, if I remove the LIMIT, the query returns the expected 1 row:
+----------------+--------------------+
|
| text | distance |
|
+----------------+--------------------+
|
| dolor sit amet | 1.4142135623730951 |
|
+----------------+--------------------+
|
1 row in set (0.001 sec)
|
Also, if I try a WHERE clause including the top result by distance, the query works correctly even with the limit:
MariaDB [test]> SELECT text, vec_distance(embedding, vec_fromtext('[1.0, 0.0, 0.0]')) AS distance
|
-> FROM vector_test
|
-> WHERE text = 'lorem ipsum'
|
-> ORDER BY distance
|
-> LIMIT 1;
|
+-------------+----------+
|
| text | distance |
|
+-------------+----------+
|
| lorem ipsum | 0 |
|
+-------------+----------+
|
1 row in set (0.002 sec)
|
It seems as though MariaDB runs the index search with ORDER BY and LIMIT first, before the WHERE condition.
To work around this issue, I decided to do the WHERE filtering in a subquery, and then in an outer query do ORDER BY and LIMIT:
SELECT * FROM (
|
SELECT text, vec_distance(embedding, vec_fromtext('[0.0, 1.0, 0.0]')) AS distance
|
FROM vector_test
|
WHERE text = 'dolor sit amet'
|
) AS unordered
|
ORDER BY distance
|
LIMIT 1;
|
Unfortunately this has even worse results, causing the server to crash! You can check the attached file with the full crash output.
Luckily, this too can be worked around if I add any LIMIT to the subquery, so the following works correctly:
MariaDB [test]> SELECT * FROM (
|
-> SELECT text, vec_distance(embedding, vec_fromtext('[1.0, 0.0, 0.0]')) AS distance
|
-> FROM vector_test
|
-> WHERE text = 'dolor sit amet'
|
-> LIMIT 1000000
|
-> ) AS unordered
|
-> ORDER BY distance
|
-> LIMIT 1;
|
+----------------+--------------------+
|
| text | distance |
|
+----------------+--------------------+
|
| dolor sit amet | 1.4142135623730951 |
|
+----------------+--------------------+
|
1 row in set (0.001 sec)
|
Attachments
Issue Links
- relates to
-
MDEV-33409 Index Condition Pushdown for k-ANN graph searches
- Open
-
MDEV-35032 streaming mode for mhnsw search
- Closed