Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35305

Vector search queries are written into slow log as "not using index"

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • N/A
    • 11.8
    • Vector search
    • None

    Description

      Queries which use vector indexes are written into the slow log under filter not_using_index, and the record claims they are using full scan.
      Oddly, they are nevertheless not written under full_scan filter (it is not in the test case, but can be seen by changing the filter).

      The test case is for reproducing purposes only, please don't put it into the regression suite as is, since it prints a raw chunk from the slow log which is bound to be non-deterministic.

      --source include/have_sequence.inc
       
      --let $logfile= `select @@slow_query_log_file`
       
      create table t (v vector(1) not null, vector(v));
      insert into t select vec_fromtext(concat('[',seq,']')) FROM seq_1_to_100;
       
      set log_slow_verbosity=full, log_slow_filter='not_using_index';
      select vec_totext(v) from t order by vec_distance_euclidean(v,vec_fromtext('[0]')) limit 3;
       
      --echo ### slow log ###
      --exec tail -n 14 $logfile
      --echo ################
       
      # Cleanup
      drop table t;
      

      7aab5e35d714fc3b6de68f5158072d5ecdf62faa

      ### slow log ###
      # Thread_id: 4  Schema: test  QC_hit: No
      # Query_time: 0.000351  Lock_time: 0.000098  Rows_sent: 3  Rows_examined: 3
      # Rows_affected: 0  Bytes_sent: 88
      # Pages_accessed: 0  Pages_read: 0  Pages_prefetched: 0  Pages_updated: 0  Old_rows_read: 0
      # Pages_read_time: 0.0000  Engine_time: 0.0071
      # Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
      # Filesort: No  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: No
      #
      # explain: id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
      # explain: 1	SIMPLE	t	index	NULL	v	6	NULL	100	3.00	100.00	100.00	
      #
      use `test`;
      SET timestamp=1730395492;
      select vec_totext(v) from t order by vec_distance_euclidean(v,vec_fromtext('[0]')) limit 3;
      ################
      

      Attachments

        Issue Links

          Activity

            This is consistent with the behavior for other indexes:

            --source include/have_sequence.inc
             
            --let $logfile= `select @@slow_query_log_file`
             
            create table t (a int, b int, index (b));
            insert t select seq*93911 % 100, seq from seq_1_to_100;
             
            set log_slow_verbosity=full, log_slow_filter='not_using_index';
            select * from t order by b limit 5;
             
            --echo ### slow log ###
            --exec tail -n 14 $logfile
            --echo ################
             
            # Cleanup
            drop table t;
            

            Perhaps all ORDER BY index LIMIT cases should not be treated as "full scan, not using index" ?

            serg Sergei Golubchik added a comment - This is consistent with the behavior for other indexes: --source include/have_sequence.inc --let $logfile= `select @@slow_query_log_file`   create table t (a int , b int , index (b)); insert t select seq*93911 % 100, seq from seq_1_to_100; set log_slow_verbosity= full , log_slow_filter= 'not_using_index' ; select * from t order by b limit 5; --echo ### slow log ### --exec tail -n 14 $logfile --echo ################ # Cleanup drop table t; Perhaps all ORDER BY index LIMIT cases should not be treated as "full scan, not using index" ?

            The KB says about this option

            Logs queries that don't use an index, or that perform a full index scan where the index doesn't limit the number of rows

            I'm not sure how to apply it to ORDER BY.
            As for the vector case, I just thought that from a practical perspective, it may be tempting for users to monitor vector search through this logging, to catch situations when an application relies on the key but it's in fact isn't used on whatever reason, e.g. due to a distance type mismatch, or whatever; but it didn't work.

            elenst Elena Stepanova added a comment - The KB says about this option Logs queries that don't use an index, or that perform a full index scan where the index doesn't limit the number of rows I'm not sure how to apply it to ORDER BY . As for the vector case, I just thought that from a practical perspective, it may be tempting for users to monitor vector search through this logging, to catch situations when an application relies on the key but it's in fact isn't used on whatever reason, e.g. due to a distance type mismatch, or whatever; but it didn't work.

            People

              serg Sergei Golubchik
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.