[MDEV-32887] vector search - Jira

Sergei Golubchik created issue - 2023-11-26 19:50

Sergei Golubchik made changes - 2023-11-26 19:50

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-32885~~ [ ~~MDEV-32885~~ ]

Sergei Golubchik made changes - 2023-11-26 19:52

Description

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets

* what algorithm, exactly, to use is still unclear

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

Sergei Golubchik made changes - 2024-01-08 16:12

Fix Version/s

11.5 [ 29506 ]

Sergei Golubchik made changes - 2024-01-08 16:12

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Golubchik added a comment - 2024-01-19 18:29

parser, grammar. CREATE TABLE works,

Sergei Golubchik added a comment - 2024-01-19 18:29 parser, grammar. CREATE TABLE works,

Sergei Golubchik added a comment - 2024-01-19 18:30

metadata is stored in the frm, read from frm, type checks (only BLOB and NOT NULL)
SHOW commands and INFORMATION_SCHEMA.STATISTICS

Sergei Golubchik added a comment - 2024-01-19 18:30 metadata is stored in the frm, read from frm, type checks (only BLOB and NOT NULL ) SHOW commands and INFORMATION_SCHEMA.STATISTICS

Sergei Golubchik added a comment - 2024-01-19 19:13

An API for a graph algorithm.

Algorithm implementation needs to provide methods for

add a vector to the graph
remove a vector from a graph
update — only if there will be some algorithm that can do it much faster that delete+insert
build a graph from a set of vectors

The server will provide methods for

read/update list of edges for a specific node

Sergei Golubchik added a comment - 2024-01-19 19:13 An API for a graph algorithm. Algorithm implementation needs to provide methods for add a vector to the graph remove a vector from a graph update — only if there will be some algorithm that can do it much faster that delete+insert build a graph from a set of vectors The server will provide methods for read/update list of edges for a specific node

Sergei Golubchik made changes - 2024-01-20 09:27

Comment

[ (/) index storage: first version. inside the table row, in the internal (not visible to a user) column

won't be performant, but allows to implement and debug the algorithm

the engine doesn't see the key at all, so enable/disable keys likely won't work. but as we'll probably change the storage in the future, it doesn't matter now. ]

Sergei Golubchik made changes - 2024-01-26 09:53

Remote Link

This issue links to "early gdoc (Web Link)" [ 36381 ]

Sergei Golubchik made changes - 2024-02-04 22:29

Description

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

h1. this task is at the moment in the _early design phase_ the ideas are recorded, added, changed, and -removed- in the linked gdoc

Sergei Golubchik made changes - 2024-02-04 22:30

Description

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

h1. this task is at the moment in the _early design phase_ the ideas are recorded, added, changed, and -removed- in the linked gdoc

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

h1. *this task is at the moment in the _early design phase_ the ideas are recorded, added, changed, and -removed- in the linked gdoc*

Sergei Golubchik made changes - 2024-02-04 22:48

Description

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

h1. *this task is at the moment in the _early design phase_ the ideas are recorded, added, changed, and -removed- in the linked gdoc*

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

h1. *this task is at the moment in the _early design phase_ the ideas are recorded, added, changed, and -removed- *in the linked gdoc*

Sergei Golubchik made changes - 2024-02-04 22:48

Description

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

h1. *this task is at the moment in the _early design phase_ the ideas are recorded, added, changed, and -removed- *in the linked gdoc*

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

h1. *this task is at the moment in the _early design phase_ the ideas are recorded, added, changed, and -removed- in the linked gdoc*

Sergei Golubchik made changes - 2024-02-07 10:53

Description

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

h1. *this task is at the moment in the _early design phase_ the ideas are recorded, added, changed, and -removed- in the linked gdoc*

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

Sergei Golubchik made changes - 2024-02-07 11:01

Link

This issue relates to ~~MDEV-33404~~ [ ~~MDEV-33404~~ ]

Sergei Golubchik made changes - 2024-02-07 11:13

Link

This issue relates to ~~MDEV-33405~~ [ ~~MDEV-33405~~ ]

Sergei Golubchik made changes - 2024-02-07 11:16

Link

This issue relates to ~~MDEV-33406~~ [ ~~MDEV-33406~~ ]

Sergei Golubchik made changes - 2024-02-07 11:18

Link

This issue relates to ~~MDEV-33407~~ [ ~~MDEV-33407~~ ]

Sergei Golubchik made changes - 2024-02-07 11:18

Link

This issue relates to ~~MDEV-32886~~ [ ~~MDEV-32886~~ ]

Sergei Golubchik made changes - 2024-02-07 11:21

Link

This issue relates to ~~MDEV-33408~~ [ ~~MDEV-33408~~ ]

Sergei Golubchik made changes - 2024-02-07 11:26

Link

This issue relates to ~~MDEV-33410~~ [ ~~MDEV-33410~~ ]

Sergei Golubchik made changes - 2024-02-07 11:29

Link

This issue relates to MDEV-33411 [ MDEV-33411 ]

Sergei Golubchik made changes - 2024-02-07 11:30

Description

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets
* optimizer-wise we'll do like with fulltext search

* what algorithm, exactly, to use is still unclear

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

* ideally they'll be engine independent
* indexes should be update-able
* in this task we'll only do Euclidean distance
* we'll benchmark it on real multi-million-rows data sets

* what algorithm, exactly, to use is still unclear

Sergei Golubchik made changes - 2024-02-07 11:41

Link

This issue relates to ~~MDEV-33413~~ [ ~~MDEV-33413~~ ]

Sergei Golubchik made changes - 2024-02-07 11:43

Link

This issue relates to ~~MDEV-33414~~ [ ~~MDEV-33414~~ ]

Sergei Golubchik made changes - 2024-02-07 11:46

Link

This issue relates to MDEV-33412 [ MDEV-33412 ]

Sergei Golubchik made changes - 2024-02-07 11:46

Link

This issue relates to MDEV-33409 [ MDEV-33409 ]

Sergei Golubchik made changes - 2024-02-07 11:52

Link

This issue relates to ~~MDEV-33415~~ [ ~~MDEV-33415~~ ]

Sergei Golubchik made changes - 2024-02-07 11:56

Link

This issue relates to ~~MDEV-33416~~ [ ~~MDEV-33416~~ ]

Sergei Golubchik made changes - 2024-02-07 12:35

Link

This issue relates to ~~MDEV-33417~~ [ ~~MDEV-33417~~ ]

Sergei Golubchik made changes - 2024-02-07 12:46

Link

This issue relates to ~~MDEV-33418~~ [ ~~MDEV-33418~~ ]

Sergei Golubchik made changes - 2024-02-07 13:04

Link

This issue relates to MDEV-33419 [ MDEV-33419 ]

Sergei Golubchik made changes - 2024-03-19 18:32

Fix Version/s		11.6 [ 29515 ]
Fix Version/s	11.5 [ 29506 ]

JiraAutomate made changes - 2024-05-19 10:26

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Sergei Golubchik made changes - 2024-06-10 20:25

Link

This issue relates to MDEV-34356 [ MDEV-34356 ]

Sergei Golubchik made changes - 2024-06-20 22:11

Link

This issue relates to ~~MDEV-34436~~ [ ~~MDEV-34436~~ ]

Julien Fritsch added a comment - 2024-07-01 13:49

From the product point of view, this is critical.

Julien Fritsch added a comment - 2024-07-01 13:49 From the product point of view, this is critical.

Julien Fritsch made changes - 2024-07-01 13:49

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Golubchik made changes - 2024-08-04 16:43

Link

This issue relates to ~~MDEV-34698~~ [ ~~MDEV-34698~~ ]

Sergei Golubchik made changes - 2024-08-04 16:43

Link

This issue relates to ~~MDEV-34699~~ [ ~~MDEV-34699~~ ]

Sergei Golubchik made changes - 2024-08-19 09:58

Link

This issue causes ~~MDEV-34774~~ [ ~~MDEV-34774~~ ]

Sergei Golubchik made changes - 2024-08-23 09:29

Link

This issue relates to MDEV-34804 [ MDEV-34804 ]

Sergei Golubchik made changes - 2024-08-23 09:36

Link

This issue relates to MDEV-34805 [ MDEV-34805 ]

Sergei Golubchik made changes - 2024-08-23 09:40

Link

This issue relates to ~~MDEV-34806~~ [ ~~MDEV-34806~~ ]

Sergei Golubchik made changes - 2024-08-24 12:43

Link

This issue relates to ~~MDEV-34811~~ [ ~~MDEV-34811~~ ]

Sergei Golubchik made changes - 2024-08-28 10:23

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Sergei Golubchik made changes - 2024-09-10 18:03

Link

This issue relates to ~~MDEV-34862~~ [ ~~MDEV-34862~~ ]

Sergei Golubchik made changes - 2024-09-13 09:23

Link

This issue relates to ~~MDEV-34919~~ [ ~~MDEV-34919~~ ]

Sergei Golubchik made changes - 2024-09-21 09:36

Fix Version/s		11.7 [ 29815 ]
Fix Version/s	11.6 [ 29515 ]

Sergei Golubchik made changes - 2024-09-24 13:53

Fix Version/s		11.8 [ 29921 ]
Fix Version/s	11.7 [ 29815 ]

Oleksandr Byelkin added a comment - 2024-09-27 10:06 - edited

Here is my review:
This

share->total_keys > share->keys

to->s->keys == to->s->total_keys

better cnvert to SHARE::has_hlindex() or something like that

I am not sure about this changes (at least buffer size should be increaed to
better reflect default length of vector test representation):

diff --git a/sql/item_vectorfunc.cc b/sql/item_vectorfunc.cc

index 8b9ea188490..315ad4e06fa 100644

--- a/sql/item_vectorfunc.cc

+++ b/sql/item_vectorfunc.cc

@@ -134,6 +134,9 @@ String *Item_func_vec_fromtext::val_str(String *buf)

   json_engine_t je;

   bool end_ok= false;

+  char buff[STRING_BUFFER_USUAL_SIZE]; // maynbe *2 or *3 has more sens

+                                       // (usual vector representation)

+  String tmp_js(buff,sizeof(buff), &my_charset_bin);

   String *value = args[0]->val_json(&tmp_js);

   CHARSET_INFO *cs= value->charset();

diff --git a/sql/item_vectorfunc.h b/sql/item_vectorfunc.h

index 58dc300c451..1687eed7e29 100644

--- a/sql/item_vectorfunc.h

+++ b/sql/item_vectorfunc.h

@@ -136,7 +136,6 @@ class Item_func_vec_totext: public Item_str_ascii_checksum_func

 class Item_func_vec_fromtext: public Item_str_func

-  String tmp_js;

 public:

   bool fix_length_and_dec(THD *thd) override;

   Item_func_vec_fromtext(THD *thd, Item *a);

Should be tested with maribackup (high posibility of fail with the new ndex)

Should be tested with mariadb-dump (probably pass, but will check that what
was printed by SHOW CREATE TABLE willbe read back)

It would be nice to have test with different types of indexses in one table
and they used by optimiser (I also do not expect problems, just to have it
checked).

IMHO VECTOR(N) type should be done before release to avoid table converting
on upgrade in the future.

IMHO this cleanups a bit fixes and better be backported if possible:

cleanup: remove unconditional #ifdef's

cleanup: key algorithm vs key flags

Oleksandr Byelkin added a comment - 2024-09-27 10:06 - edited Here is my review: This share->total_keys > share->keys to->s->keys == to->s->total_keys better cnvert to SHARE::has_hlindex() or something like that I am not sure about this changes (at least buffer size should be increaed to better reflect default length of vector test representation): diff --git a/sql/item_vectorfunc.cc b/sql/item_vectorfunc.cc index 8b9ea188490..315ad4e06fa 100644 --- a/sql/item_vectorfunc.cc +++ b/sql/item_vectorfunc.cc @@ -134,6 +134,9 @@ String *Item_func_vec_fromtext::val_str(String *buf) { json_engine_t je; bool end_ok= false; + char buff[STRING_BUFFER_USUAL_SIZE]; // maynbe *2 or *3 has more sens + // (usual vector representation) + String tmp_js(buff,sizeof(buff), &my_charset_bin); String *value = args[0]->val_json(&tmp_js); CHARSET_INFO *cs= value->charset(); diff --git a/sql/item_vectorfunc.h b/sql/item_vectorfunc.h index 58dc300c451..1687eed7e29 100644 --- a/sql/item_vectorfunc.h +++ b/sql/item_vectorfunc.h @@ -136,7 +136,6 @@ class Item_func_vec_totext: public Item_str_ascii_checksum_func class Item_func_vec_fromtext: public Item_str_func { - String tmp_js; public: bool fix_length_and_dec(THD *thd) override; Item_func_vec_fromtext(THD *thd, Item *a); Should be tested with maribackup (high posibility of fail with the new ndex) Should be tested with mariadb-dump (probably pass, but will check that what was printed by SHOW CREATE TABLE willbe read back) It would be nice to have test with different types of indexses in one table and they used by optimiser (I also do not expect problems, just to have it checked). IMHO VECTOR(N) type should be done before release to avoid table converting on upgrade in the future. IMHO this cleanups a bit fixes and better be backported if possible: cleanup: remove unconditional #ifdef's cleanup: key algorithm vs key flags

Sergei Golubchik made changes - 2024-10-17 21:36

Link

This issue relates to MDEV-35196 [ MDEV-35196 ]

Sergei Golubchik made changes - 2024-10-21 14:41

Link

This issue relates to MDEV-35222 [ MDEV-35222 ]

Sergei Golubchik made changes - 2024-10-26 20:28

Link

This issue relates to ~~MDEV-34919~~ [ ~~MDEV-34919~~ ]

Sergei Golubchik made changes - 2024-10-26 20:31

Link

This issue relates to MDEV-35264 [ MDEV-35264 ]

Sergei Golubchik made changes - 2024-10-26 20:31

Link

This issue causes ~~MDEV-34774~~ [ ~~MDEV-34774~~ ]

Sergei Golubchik made changes - 2024-10-29 11:37

Link

This issue relates to MDEV-35283 [ MDEV-35283 ]

Sergei Golubchik made changes - 2024-11-01 19:44

Link

This issue relates to MDEV-35314 [ MDEV-35314 ]

Sergei Golubchik made changes - 2024-11-02 13:53

Link

This issue relates to MDEV-35315 [ MDEV-35315 ]

Sergei Golubchik made changes - 2024-11-02 13:54

Link

This issue relates to MDEV-35316 [ MDEV-35316 ]

Sergei Golubchik made changes - 2024-11-04 02:51

Link

This issue relates to MDEV-35327 [ MDEV-35327 ]

Sergei Golubchik made changes - 2024-11-13 18:15

Fix Version/s

11.8 [ 29921 ]

Sergei Golubchik made changes - 2024-11-13 18:15

Issue Type

New Feature [ 2 ]

Epic [ 5 ]

Sergei Golubchik made changes - 2024-11-13 18:15

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Sergei Golubchik made changes - 2024-11-13 18:15

Fix Version/s

N/A [ 14700 ]

Sergei Golubchik made changes - 2024-11-14 10:29

Link

This issue relates to MDEV-35418 [ MDEV-35418 ]

Sergei Golubchik made changes - 2024-11-16 11:42

Summary

k-ANN indexes for vectors

vector search

Sergei Golubchik made changes - 2025-01-13 14:06

Link

This issue includes MDEV-35831 [ MDEV-35831 ]

Sergei Golubchik made changes - 2025-01-13 14:07

Link

This issue relates to MDEV-35831 [ MDEV-35831 ]

Sergei Golubchik made changes - 2025-01-13 14:07

Link

This issue includes MDEV-35831 [ MDEV-35831 ]

Sergei Golubchik made changes - 2025-01-14 09:25

Link

This issue relates to MDEV-35841 [ MDEV-35841 ]

Sergei Golubchik made changes - 2025-02-15 13:55

Link

This issue relates to MDEV-36100 [ MDEV-36100 ]

Sergei Golubchik made changes - 2025-02-15 13:55

Link

This issue relates to MDEV-35629 [ MDEV-35629 ]

Sergey Vojtovich made changes - 2025-02-26 20:02

Link

This issue relates to ~~MDEV-36184~~ [ ~~MDEV-36184~~ ]

Julien Fritsch made changes - 2025-03-17 00:31

Epic Name

vectore search

Julien Fritsch made changes - 2025-03-17 00:31

Epic Name

vectore search

vector search

Adam Luciano made changes - 2025-04-11 17:06

Epic Child

MDEV-36568 [ 133790 ]

Adam Luciano made changes - 2025-04-15 16:41

Epic Child

~~MDEV-36605~~ [ 133866 ]

MariaDB Server

vector search

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration