[MDEV-35305] Vector search queries are written into slow log as "not using index" - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: N/A
Fix Version/s: 11.8
Component/s: Vector search
Labels:
None

Description

Queries which use vector indexes are written into the slow log under filter not_using_index, and the record claims they are using full scan.
Oddly, they are nevertheless not written under full_scan filter (it is not in the test case, but can be seen by changing the filter).

The test case is for reproducing purposes only, please don't put it into the regression suite as is, since it prints a raw chunk from the slow log which is bound to be non-deterministic.

--source include/have_sequence.inc

--let $logfile= `select @@slow_query_log_file`

create table t (v vector(1) not null, vector(v));

insert into t select vec_fromtext(concat('[',seq,']')) FROM seq_1_to_100;

set log_slow_verbosity=full, log_slow_filter='not_using_index';

select vec_totext(v) from t order by vec_distance_euclidean(v,vec_fromtext('[0]')) limit 3;

--echo ### slow log ###

--exec tail -n 14 $logfile

--echo ################

# Cleanup

drop table t;

7aab5e35d714fc3b6de68f5158072d5ecdf62faa
### slow log ###
# Thread_id: 4 Schema: test QC_hit: No
# Query_time: 0.000351 Lock_time: 0.000098 Rows_sent: 3 Rows_examined: 3
# Rows_affected: 0 Bytes_sent: 88
# Pages_accessed: 0 Pages_read: 0 Pages_prefetched: 0 Pages_updated: 0 Old_rows_read: 0
# Pages_read_time: 0.0000 Engine_time: 0.0071
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
#
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
# explain: 1 SIMPLE t index NULL v 6 NULL 100 3.00 100.00 100.00
#
use `test`;
SET timestamp=1730395492;
select vec_totext(v) from t order by vec_distance_euclidean(v,vec_fromtext('[0]')) limit 3;
################

Attachments

Issue Links

is caused by

MDEV-34939 vector search in 11.7

Closed

Activity

Ascending order - Click to sort in descending order

Sergei Golubchik added a comment - 2025-02-09 17:44

This is consistent with the behavior for other indexes:

--source include/have_sequence.inc

--let $logfile= `select @@slow_query_log_file`

create table t (a int, b int, index (b));

insert t select seq*93911 % 100, seq from seq_1_to_100;

set log_slow_verbosity=full, log_slow_filter='not_using_index';

select * from t order by b limit 5;

--echo ### slow log ###

--exec tail -n 14 $logfile

--echo ################

# Cleanup

drop table t;

Perhaps all ORDER BY index LIMIT cases should not be treated as "full scan, not using index" ?

Sergei Golubchik added a comment - 2025-02-09 17:44 This is consistent with the behavior for other indexes: --source include/have_sequence.inc --let $logfile= `select @@slow_query_log_file` create table t (a int , b int , index (b)); insert t select seq*93911 % 100, seq from seq_1_to_100; set log_slow_verbosity= full , log_slow_filter= 'not_using_index' ; select * from t order by b limit 5; --echo ### slow log ### --exec tail -n 14 $logfile --echo ################ # Cleanup drop table t; Perhaps all ORDER BY index LIMIT cases should not be treated as "full scan, not using index" ?

Elena Stepanova added a comment - 2025-02-09 18:04

The KB says about this option

Logs queries that don't use an index, or that perform a full index scan where the index doesn't limit the number of rows

I'm not sure how to apply it to ORDER BY.
As for the vector case, I just thought that from a practical perspective, it may be tempting for users to monitor vector search through this logging, to catch situations when an application relies on the key but it's in fact isn't used on whatever reason, e.g. due to a distance type mismatch, or whatever; but it didn't work.

Elena Stepanova added a comment - 2025-02-09 18:04 The KB says about this option Logs queries that don't use an index, or that perform a full index scan where the index doesn't limit the number of rows I'm not sure how to apply it to ORDER BY . As for the vector case, I just thought that from a practical perspective, it may be tempting for users to monitor vector search through this logging, to catch situations when an application relies on the key but it's in fact isn't used on whatever reason, e.g. due to a distance type mismatch, or whatever; but it didn't work.

People

Assignee:: Sergei Golubchik

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2024-10-31 17:26

Updated:: 2025-03-23 19:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server