Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37709

VEC_DISTANCE_COSINE results are flaky with order by (& limit)

    XMLWordPrintable

Details

    • Bug
    • Status: Needs Feedback (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.8.3
    • None
    • Vector search
    • None
    • Debian 13

    Description

      Hi, I'm using VEC_DISTANCE_COSINE and want to sort on the results. But it seems that sorting ascending with an limit makes the results wrong. When I retry the same query: without limit, sorting descending or remove 'order by' it works correctly.

      My table has other filters as well (and not always using indexes). In some cases it will return a subset of the requested items when limiting results. It looks like not all records are evaluated upon `order by YY asc limit XX` with some where clauses on other columns.

      I can provide a test set, but I can only reduce it to a SQL file for 250MB+. As I dont know what I'm looking at, I dont know how to reduce my set.

      My table structure:

      CREATE TABLE `text_embeddings` (
        `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        `created_at` datetime DEFAULT NULL,
        `updated_at` datetime DEFAULT NULL,
        `source_type` varchar(255) NOT NULL,
        `source_id` varchar(150) NOT NULL,
        `language` varchar(8) NOT NULL DEFAULT 'global',
        `embeddings` vector(1536) NOT NULL,
        PRIMARY KEY (`id`),
        UNIQUE KEY `text_embeddings_src_lang_unique` (`source_type`,`source_id`),
        KEY `text_embeddings_language_index` (`language`),
        VECTOR KEY `text_embeddings_vector_index` (`embeddings`) `DISTANCE`=cosine
      ) ENGINE=InnoDB AUTO_INCREMENT=248854 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
      

      My query:

      select source_type, source_id, language, VEC_DISTANCE_COSINE(embeddings, VEC_FromText('[....]')) as dist from `text_embeddings` where (`language` = 'global' or `language` = 'nl_NL') and `source_type` LIKE '%filter%' order by dist ASC limit 40;
      

      It seems to be related to MDEV-37078, but I hope to be able to provide more information to solve this issue as it made it into LTS.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rikvdh Rik van der Heijden
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.