Details

    • Epic
    • Status: Stalled (View Workflow)
    • Critical
    • Resolution: Unresolved
    • N/A
    • Server
    • vector search

    Description

      A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
      This task is about implementing indexes that allow to do it fast.

      • ideally they'll be engine independent
      • indexes should be update-able
      • in this task we'll only do Euclidean distance
      • we'll benchmark it on real multi-million-rows data sets
      • what algorithm, exactly, to use is still unclear

      Attachments

        Issue Links

          Activity

            parser, grammar. CREATE TABLE works,

            serg Sergei Golubchik added a comment - parser, grammar. CREATE TABLE works,

            metadata is stored in the frm, read from frm, type checks (only BLOB and NOT NULL)
            SHOW commands and INFORMATION_SCHEMA.STATISTICS

            serg Sergei Golubchik added a comment - metadata is stored in the frm, read from frm, type checks (only BLOB and NOT NULL ) SHOW commands and INFORMATION_SCHEMA.STATISTICS

            An API for a graph algorithm.

            Algorithm implementation needs to provide methods for

            • add a vector to the graph
            • remove a vector from a graph
            • update — only if there will be some algorithm that can do it much faster that delete+insert
            • build a graph from a set of vectors

            The server will provide methods for

            • read/update list of edges for a specific node
            serg Sergei Golubchik added a comment - An API for a graph algorithm. Algorithm implementation needs to provide methods for add a vector to the graph remove a vector from a graph update — only if there will be some algorithm that can do it much faster that delete+insert build a graph from a set of vectors The server will provide methods for read/update list of edges for a specific node

            From the product point of view, this is critical.

            julien.fritsch Julien Fritsch added a comment - From the product point of view, this is critical.
            sanja Oleksandr Byelkin added a comment - - edited

            Here is my review:
            This

            share->total_keys > share->keys
            to->s->keys == to->s->total_keys
            

            better cnvert to SHARE::has_hlindex() or something like that

            I am not sure about this changes (at least buffer size should be increaed to
            better reflect default length of vector test representation):

            diff --git a/sql/item_vectorfunc.cc b/sql/item_vectorfunc.cc
            index 8b9ea188490..315ad4e06fa 100644
            --- a/sql/item_vectorfunc.cc
            +++ b/sql/item_vectorfunc.cc
            @@ -134,6 +134,9 @@ String *Item_func_vec_fromtext::val_str(String *buf)
             {
               json_engine_t je;
               bool end_ok= false;
            +  char buff[STRING_BUFFER_USUAL_SIZE]; // maynbe *2 or *3 has more sens
            +                                       // (usual vector representation)
            +  String tmp_js(buff,sizeof(buff), &my_charset_bin);
               String *value = args[0]->val_json(&tmp_js);
               CHARSET_INFO *cs= value->charset();
             
            diff --git a/sql/item_vectorfunc.h b/sql/item_vectorfunc.h
            index 58dc300c451..1687eed7e29 100644
            --- a/sql/item_vectorfunc.h
            +++ b/sql/item_vectorfunc.h
            @@ -136,7 +136,6 @@ class Item_func_vec_totext: public Item_str_ascii_checksum_func
             
             class Item_func_vec_fromtext: public Item_str_func
             {
            -  String tmp_js;
             public:
               bool fix_length_and_dec(THD *thd) override;
               Item_func_vec_fromtext(THD *thd, Item *a);
            

            Should be tested with maribackup (high posibility of fail with the new ndex)

            Should be tested with mariadb-dump (probably pass, but will check that what
            was printed by SHOW CREATE TABLE willbe read back)

            It would be nice to have test with different types of indexses in one table
            and they used by optimiser (I also do not expect problems, just to have it
            checked).

            IMHO VECTOR(N) type should be done before release to avoid table converting
            on upgrade in the future.


            IMHO this cleanups a bit fixes and better be backported if possible:

            cleanup: remove unconditional #ifdef's

            cleanup: key algorithm vs key flags

            sanja Oleksandr Byelkin added a comment - - edited Here is my review: This share->total_keys > share->keys to->s->keys == to->s->total_keys better cnvert to SHARE::has_hlindex() or something like that I am not sure about this changes (at least buffer size should be increaed to better reflect default length of vector test representation): diff --git a/sql/item_vectorfunc.cc b/sql/item_vectorfunc.cc index 8b9ea188490..315ad4e06fa 100644 --- a/sql/item_vectorfunc.cc +++ b/sql/item_vectorfunc.cc @@ -134,6 +134,9 @@ String *Item_func_vec_fromtext::val_str(String *buf) { json_engine_t je; bool end_ok= false; + char buff[STRING_BUFFER_USUAL_SIZE]; // maynbe *2 or *3 has more sens + // (usual vector representation) + String tmp_js(buff,sizeof(buff), &my_charset_bin); String *value = args[0]->val_json(&tmp_js); CHARSET_INFO *cs= value->charset(); diff --git a/sql/item_vectorfunc.h b/sql/item_vectorfunc.h index 58dc300c451..1687eed7e29 100644 --- a/sql/item_vectorfunc.h +++ b/sql/item_vectorfunc.h @@ -136,7 +136,6 @@ class Item_func_vec_totext: public Item_str_ascii_checksum_func class Item_func_vec_fromtext: public Item_str_func { - String tmp_js; public: bool fix_length_and_dec(THD *thd) override; Item_func_vec_fromtext(THD *thd, Item *a); Should be tested with maribackup (high posibility of fail with the new ndex) Should be tested with mariadb-dump (probably pass, but will check that what was printed by SHOW CREATE TABLE willbe read back) It would be nice to have test with different types of indexses in one table and they used by optimiser (I also do not expect problems, just to have it checked). IMHO VECTOR(N) type should be done before release to avoid table converting on upgrade in the future. IMHO this cleanups a bit fixes and better be backported if possible: cleanup: remove unconditional #ifdef's cleanup: key algorithm vs key flags

            People

              serg Sergei Golubchik
              serg Sergei Golubchik
              Votes:
              2 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.