[MDEV-34939] vector search in 11.7 - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 11.7.1
Component/s: Vector search
Labels:
- Preview_11.7

Description

An umbrella task for all vector search features that are planned to make it into 11.7

Attachments

Issue Links

causes

MDEV-34967 MSAN failure in main.vector

Closed

MDEV-34970 Vector search fails to compile on s390x

Closed

MDEV-34971 Vector search fails to compile on x86_32

Closed

MDEV-34989 After selecting from empty table with vector key the next insert hangs

Closed

MDEV-35005 vec_distance_cosine can return negative values

Closed

MDEV-35006 Using varbinary as vector-storing column results in assertion failures

Closed

MDEV-35020 After a failed attempt to create vector index temporary file remains and prevents further operation

Closed

MDEV-35021 Behavior for RTREE indexes changed, assertion fails

Closed

MDEV-35028 Unexpected ER_DUP_ENTRY/ER_DUP_KEY, ASAN errors after TRUNCATE on table with vector index

Closed

MDEV-35029 ASAN errors in Lex_ident<Compare_ident_ci>::is_valid_ident upon DDL on table with vector index

Closed

MDEV-35031 Update on vector column returns error but modifies the value, results in further ER_KEY_NOT_FOUND

Closed

MDEV-35033 LeakSanitizer errors in my_malloc / safe_mutex_lazy_init_deadlock_detection / MHNSW_Context::alloc_node and alike

Closed

MDEV-35034 Non-debug assertion failure after unsuccessful attempt to add vector index

Closed

MDEV-35035 Assertion failure in ha_blackhole::position upon INSERT into blackhole table with vector index

Closed

MDEV-35036 Assertion failure in myrocks::ha_rocksdb::position upon INSERT into RocksDB table with vector index

Closed

MDEV-35037 Invalid (old?) table or database name 't#i#00' upon creating RocksDB table with vector index

Closed

MDEV-35038 Server crash in Index_statistics::get_avg_frequency upon EITS collection for vector index

Closed

MDEV-35039 Number of indexes inside InnoDB differs from that defined in MariaDB after altering table with vector key

Closed

MDEV-35042 Vector indexes are allowed for MERGE tables, but do not work

Closed

MDEV-35043 Unsuitable error upon an attempt to create MEMORY table with vector key

Closed

MDEV-35044 ALTER on a table with vector index attempts to bypass unsupported locking limitation, server crashes in THD::free_tmp_table_share

Closed

MDEV-35055 ASAN errors in TABLE_SHARE::lock_share upon committing transaction after FLUSH on table with vector key

Closed

MDEV-35058 Non-debug assertion failure upon concurrent vector index creation and select

Closed

MDEV-35060 Assertion failure upon DML on table with vector under lock

Closed

MDEV-35061 XA PREPARE "not supported by the engine" from storage engine mhnsw, memory leak

Closed

MDEV-35063 Assertion `v->distance_to_target >= threshold' fails upon adding certain values to vector key

Closed

MDEV-35069 IMPORT TABLESPACE does not work for tables with vector, although allowed

Closed

MDEV-35071 Poor recall upon vector search (300 dimensions, 10K rows)

Closed

MDEV-35077 Assertion failure in myrocks::ha_rocksdb::position_to_correct_key upon using unique hash key

Closed

MDEV-35078 Server crash or ASAN errors in mhnsw_insert

Closed

MDEV-35081 Assertion `!n_mysql_tables_in_use' failed after error upon binary logging of DML involving vector table

Closed

MDEV-35083 ER_UNSUPPORTED_EXTENSION upon using HASH keys with InnoDB

Closed

MDEV-35084 Assertion `v->distance_to_target >= threshold' fails upon adding overflowing values to vector key #2

Closed

MDEV-35087 Server crash or ASAN errors in _mi_write_blob_record upon using BINARY of certain lengths as vector column

Closed

MDEV-35092 Server crash, hang or ASAN errors in mysql_create_frm_image upon using non-default table options and system variables

Closed

MDEV-35105 Assertion `tab->join->order' fails upon vector search with DISTINCT

Closed

MDEV-35130 Assertion fails in trx_t::check_bulk_buffer upon CREATE.. SELECT with vector key

Closed

MDEV-35131 Assertion `std::isnan(v->distance_to_target) || v->distance_to_target >= threshold' failed upon SELECT

Closed

MDEV-35141 Server crashes in Field_vector::report_wrong_value upon statistic collection

Closed

MDEV-35146 Vector-related error messages worth improving when possible

Closed

MDEV-35147 Inconsistent NULL handling in vector type

Closed

MDEV-35148 Foreign key on vector column refuses to be created inconsistently and on a wrong reason

Open

MDEV-35150 Column containing non-vector values can be modified to VECTOR type without warnings

Closed

MDEV-35151 Alter table with vector operations is not atomic, temporary files remain

Closed

MDEV-35152 DATA/INDEX DIRECTORY options are ignored for vector index

Closed

MDEV-35158 Assertion `res->length() > 0 && res->length() % 4 == 0' fails upon increasing length of vector column

Closed

MDEV-35159 Assertion `tab->join->select_limit < (~ (ha_rows) 0)' fails upon forcing vector key

Closed

MDEV-35160 RBR does not work with vector type, ER_SLAVE_CONVERSION_FAILED

Closed

MDEV-35161 UPDATE and DELETE do not use vector key

Open

MDEV-35175 Vector functions re-use JSON warnings

Closed

MDEV-35176 ASAN errors in Field_vector::store with optimizer_trace enabled

Closed

MDEV-35177 Unexpected ER_TRUNCATED_WRONG_VALUE_FOR_FIELD, diagnostics area assertion failures upon EITS collection with vector type

Closed

MDEV-35178 Assertion failure in Field_vector::store upon INSERT IGNORE with a wrong data

Closed

MDEV-35182 crash in online_alter_end_trans with XA over vector indexes

Closed

MDEV-35184 Corruption errors upon creation or usage of Federated table with vector key

Open

MDEV-35185 Query cache used for results of vector search conflicts with the purpose of mhnsw_min_limit

Open

MDEV-35186 IGNORED attribute has no effect on vector keys

Closed

MDEV-35191 Assertion failure in Create_tmp_table::finalize upon DISTINCT with vector type

Closed

MDEV-35192 Distance functions on vectors of different length return NULL without warnings

Open

MDEV-35194 non-BNL join fails on assertion

Closed

MDEV-35195 Assertion `tab->join->order' fails upon vector search with DISTINCT #2

Closed

MDEV-35198 ER_CRASHED_ON_USAGE or assertion failure after myisampack on table with vector key

Open

MDEV-35203 ASAN errors or assertion failures in row_sel_convert_mysql_key_to_innobase upon query from table with usual key on vector field

Closed

MDEV-35204 mysqlbinlog --verbose fails on row events with vector type

Closed

MDEV-35205 Server crash in online alter upon concurrent ALTER and DML on table with vector field

Closed

MDEV-35210 Vector type cannot store values which VEC_FromText produces and VEC_ToText accepts

Closed

MDEV-35211 VEC_FromText does not return vector type but varbinary

Open

MDEV-35212 Server crashes in Item_func_vec_fromtext::val_str upon query from empty table

Closed

MDEV-35213 Server crash or assertion failure upon query with high value of mhnsw_min_limit

Closed

MDEV-35214 Server crashes in FVectorNode::gref_len with insufficient mhnsw_max_cache_size

Closed

MDEV-35215 ASAN errors in Item_func_vec_fromtext::val_str upon VEC_FROMTEXT with an invalid argument

Closed

MDEV-35219 Unexpected ER_DUP_KEY after OPTIMIZE on MyISAM table with vector key

Closed

MDEV-35220 Assertion `!item->null_value' failed upon VEC_TOTEXT call

Closed

MDEV-35221 Vector values do not survive mariadb-dump / restore

Closed

MDEV-35223 REPAIR does not fix MyISAM table with vector key after crash recovery

Closed

MDEV-35230 ASAN errors upon reading from joined temptable views with vector type

Closed

MDEV-35241 DROP TABLE on table with vector key not atomic, leads to ER_NO_SUCH_TABLE_IN_ENGINE

Open

MDEV-35244 Vector-related system variables could use better names

Closed

MDEV-35245 SHOW CREATE TABLE produces unusable statement for vector fields with constant default value

Closed

MDEV-35246 Vector search skips a row in the table

Closed

MDEV-35258 Mariabackup does not work with MyISAM tables with vector keys

Closed

MDEV-35263 rpl.vector fails when executed in a group of tests

Closed

MDEV-35267 Server crashes in _ma_reset_history upon altering on Aria table with vector key under lock

Closed

MDEV-35271 XA behavior changed, assertion fails in Ha_trx_info::is_trx_read_write

Stalled

MDEV-35284 Server crash or ASAN errors in mhnsw_read_next upon using vectors within transaction

Closed

MDEV-35287 ER_KEY_NOT_FOUND upon INSERT into InnoDB table with vector key under READ COMMITTED

Closed

MDEV-35292 ALTER TABLE re-creating vector key is no-op with non-copying alter algorithms (default)

Closed

MDEV-35296 DESC does not work in ORDER BY with vector key

Closed

MDEV-35302 ASAN errors or assertion failure in mhnsw_read_first upon vector search with join

Closed

MDEV-35305 Vector search queries are written into slow log as "not using index"

Open

MDEV-35308 NO_KEY_OPTIONS SQL mode has no effect on engine key options

Closed

MDEV-35309 ALTER performs vector truncation without WARN_DATA_TRUNCATED or similar warnings/errors

Closed

MDEV-35317 Server crashes in mhnsw_insert upon using vector key on a Spider table

Closed

MDEV-35319 ER_LOCK_DEADLOCK not detected upon DML on table with vector key, server crashes

Closed

MDEV-35320 Non-default distance function and M are not replicated

Open

MDEV-35321 INDEX_STATISTICS does not show the use of a vector key

Open

MDEV-35322 Vector search is not shown in perfschema, queries are counted as not using index

Open

MDEV-35323 ER_TOO_BIG_FIELDLENGTH shows wrong maximum length for vector field

Open

MDEV-35324 Different index type shown in SHOW INDEX vs SHOW CREATE TABLE

Closed

MDEV-35325 DROP TABLE on Mroonga table with vector key fails with ER_NO_SUCH_TABLE

Open

MDEV-35328 Corruption-like errors upon and after REPAIR .. USE_FRM on table with vector key

Open

MDEV-35337 Server crash or assertion failure in join_read_first upon using vector distance in group by

Closed

MDEV-35338 Non-copying ALTER does not pad VECTOR column, vector search further does not work

Closed

MDEV-35339 Different behavior of implicit vector conversion comparing to other types and DDL vs DML

Open

MDEV-35340 In Oracle-styled SPs unspecified length of vector field defaults to 1000

Open

MDEV-35354 InnoDB: Failing assertion: node->pcur->rel_pos == BTR_PCUR_ON upon LOAD DATA REPLACE with unique blob

Closed

MDEV-35769 ER_SQL_DISCOVER_ERROR upon updating vector key column using incorrect value

Closed

MDEV-35792 Adding a regular index on a vector column leads to invalid table structure

Closed

MDEV-35793 Server crashes in Item_func_vec_distance_common::get_const_arg

Closed

MDEV-35834 Server crash in FVector::distance_to upon concurrent SELECT

Closed

MDEV-36005 Server crashes when checking/updatng a table having vector key after enabling innodb_force_primary_key

Closed

MDEV-36011 Server crashes in Charset::mbminlen / Item_func_vec_fromtext::val_str upon mixing vector type with string

Closed

includes

MDEV-32885 VEC_DISTANCE() function

Closed

MDEV-32886 VEC_FromText() and VEC_ToText() functions

Closed

MDEV-33404 Engine-independent indexes: subtable method

Closed

MDEV-33406 basic optimizer support for k-NN searches

Closed

MDEV-33407 Parser support for vector indexes

Closed

MDEV-33408 HNSW for k-ANN vector searches

Closed

MDEV-33413 cache k-ANN graph in memory

Closed

MDEV-33414 benchmark vector indexes

Closed

MDEV-33416 graph index: use smaller floating point numbers

Closed

MDEV-33417 VEC_DISTANCE_COSINE() function

Closed

MDEV-33418 graph index insert: stronger selection of neighbors

Closed

MDEV-34436 DDL: per-index attributes

Closed

MDEV-34698 mhnsw: support AVX-512 instructions

Closed

MDEV-34811 handlerton refactoring

Closed

MDEV-34942 packaging dependency for eigen3

Closed

relates to

MDBF-796 Add Eigen onto BB workers

Closed

MDEV-35082 HANDLER with FULLTEXT keys is not always rejected

Closed

(107 causes, 15 includes, 2 relates to)

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2024-09-19 16:35

Branch bb-11.6-MDEV-32887-vector

Elena Stepanova added a comment - 2024-09-19 16:35 Branch bb-11.6-MDEV-32887-vector

Elena Stepanova added a comment - 2024-11-06 11:16 - edited

In my opinion, the feature in its current shape can be pushed into the main branch and released with 11.7.1.

In short, it appears stable enough for the RC after all the bugfixing, and we need the community to start experimenting with it on realistic datasets and use cases for possible further tuning before GA. The internal feature-focused testing will also be continued on the main/11.7 branch before and after 11.7.1 release.

Long version:

The main shortage of internal feature testing in this case was (and still is) that there is no usable criteria/requirements for "sufficient result correctness".

Normally correctness is a fixed characteristic which does not cause much controversy and to a large extent can be tested on a variety of datasets, not necessarily real-life ones, while performance remains relative and measured either on standard benchmarks (with the common understanding that they don't necessarily represent realistic use cases) or, in some cases, on actual real-life scenarios.

In case of vector search with its results being approximate by nature, we have two flexible characteristics which depend on each other (better correctness leads to worse performance and vice versa), and for neither of which we can set the hard limit "it cannot go worse than that under any circumstances" on any given dataset.

Whatever we know now about the comparative performance/recall of the current implementation was already presented in public talks and blog posts by feature developers. This stage of internal testing was mainly focused on stability and other less controversial aspects of the feature. I cannot claim such testing to be sufficient and I don't believe it will ever be, which is why I think it is important to get the feature out to the public and gather as much information as possible about what users consider more important in which cases, how much precision can be sacrificed for the sake of performance, and so on. I expect there will always be a fair amount of dissatisfaction as different use cases have different requirements, but hopefully we will get a bigger picture than we have now.

Meanwhile, below are some notes from the testing, mostly for documentation and other "user must be aware" purposes.

I won't list those limitations or issues which are immediately obvious, only some which can remain unnoticed but cause troubles. The list is dynamic, so some notes can become outdated quickly. In no particular order.

vector key is not used for ORDER BY .. DESC – the query will work, but full scan will be performed (~~MDEV-35296~~: fixed by disabling);
vector key is not used in UPDATE and DELETE – the query will work, but full scan will be performed (MDEV-35161);
vector key is not used in FROM subqueries / views;
vector key is used only when ORDER BY VEC_Distance_<xxx>(<col>,<constant>) LIMIT <x> – that is, not any expression involving it, nor a wrapping function, etc.;
InnoDB bulk insert does not work for tables with vector key – data loading can be not as fast as expected (~~MDEV-35287~~, ~~MDEV-35130~~: fixed by disabling);
IMPORT TABLESPACE does not work, although allowed – can cause unexpected errors later (~~MDEV-35069~~);
DATA/INDEX DIRECTORY options are ignored for vector index – files can end up in a different location than the user was planning (~~MDEV-35152~~);
IGNORED attribute has no effect on vector keys – using it in experiments can lead to wrong conclusions (~~MDEV-35186~~);
non-default distance and M are not replicated – the replica can end up with a different index structure than assumed, and the search won't use the index (MDEV-35320);
optimizer doesn't / cannot take into account M and ef_search – vector search can turn out to be very non-optimal comparing to other possible plans;
VEC_ToText can return a text representation of invalid vectors – it can be confusing that the JSON looks okay, but the value cannot be inserted (~~MDEV-35210~~, fixed with the note "VEC_ToText still prints everything");
views involving vector functions may lead to non-working dumps produced by mariadb-dump (MDEV-35286, the bug was filed for GIS, vector has the same problem);
distance functions on vectors of different length return NULL without warnings (MDEV-35192, an error which is easy to make in an application and which can cause unexpected results as the search will become fully random);
XA involving vectors may cause replication errors (MDEV-35271, MDEV-35196);
myisampack should be avoided for now (MDEV-35198);
DROP TABLE on a table with vector key should be performed carefully, either preventing possible failures or doing cleanup afterwards, as it is not atomic, can lead to corruption-like errors (MDEV-35241);
for ALTER TABLE on tables with vector key, better to use explicit ALGORITHM=COPY, as a non-copying algorithm may otherwise be chosen by default and cause issues (~~MDEV-35292~~, ~~MDEV-35338~~);
for tables with vector keys, engines other than InnoDB, Aria, and MyISAM should better be avoided for now even if they seem to accept table creation (Spider and Mroonga are known to have issues, most of other engines will be rejected right away);
when vectors are involved, mariadb-dump should be run with --hex-dump option, otherwise the data can be lost (~~MDEV-35221~~)
while experimenting with mhnsw_ef_search at runtime, make sure that query cache is disabled, otherwise there will be no expected effect (MDEV-35185)

Elena Stepanova added a comment - 2024-11-06 11:16 - edited In my opinion, the feature in its current shape can be pushed into the main branch and released with 11.7.1. In short, it appears stable enough for the RC after all the bugfixing, and we need the community to start experimenting with it on realistic datasets and use cases for possible further tuning before GA. The internal feature-focused testing will also be continued on the main/11.7 branch before and after 11.7.1 release. Long version: The main shortage of internal feature testing in this case was (and still is) that there is no usable criteria/requirements for "sufficient result correctness". Normally correctness is a fixed characteristic which does not cause much controversy and to a large extent can be tested on a variety of datasets, not necessarily real-life ones, while performance remains relative and measured either on standard benchmarks (with the common understanding that they don't necessarily represent realistic use cases) or, in some cases, on actual real-life scenarios. In case of vector search with its results being approximate by nature, we have two flexible characteristics which depend on each other (better correctness leads to worse performance and vice versa), and for neither of which we can set the hard limit "it cannot go worse than that under any circumstances" on any given dataset. Whatever we know now about the comparative performance/recall of the current implementation was already presented in public talks and blog posts by feature developers. This stage of internal testing was mainly focused on stability and other less controversial aspects of the feature. I cannot claim such testing to be sufficient and I don't believe it will ever be, which is why I think it is important to get the feature out to the public and gather as much information as possible about what users consider more important in which cases, how much precision can be sacrificed for the sake of performance, and so on. I expect there will always be a fair amount of dissatisfaction as different use cases have different requirements, but hopefully we will get a bigger picture than we have now. Meanwhile, below are some notes from the testing, mostly for documentation and other "user must be aware" purposes. I won't list those limitations or issues which are immediately obvious, only some which can remain unnoticed but cause troubles. The list is dynamic, so some notes can become outdated quickly. In no particular order. vector key is not used for ORDER BY .. DESC – the query will work, but full scan will be performed ( MDEV-35296 : fixed by disabling); vector key is not used in UPDATE and DELETE – the query will work, but full scan will be performed ( MDEV-35161 ); vector key is not used in FROM subqueries / views; vector key is used only when ORDER BY VEC_Distance_<xxx>(<col>,<constant>) LIMIT <x> – that is, not any expression involving it, nor a wrapping function, etc.; InnoDB bulk insert does not work for tables with vector key – data loading can be not as fast as expected ( MDEV-35287 , MDEV-35130 : fixed by disabling); IMPORT TABLESPACE does not work, although allowed – can cause unexpected errors later ( MDEV-35069 ); DATA/INDEX DIRECTORY options are ignored for vector index – files can end up in a different location than the user was planning ( MDEV-35152 ); IGNORED attribute has no effect on vector keys – using it in experiments can lead to wrong conclusions ( MDEV-35186 ); non-default distance and M are not replicated – the replica can end up with a different index structure than assumed, and the search won't use the index ( MDEV-35320 ); optimizer doesn't / cannot take into account M and ef_search – vector search can turn out to be very non-optimal comparing to other possible plans; VEC_ToText can return a text representation of invalid vectors – it can be confusing that the JSON looks okay, but the value cannot be inserted ( MDEV-35210 , fixed with the note "VEC_ToText still prints everything"); views involving vector functions may lead to non-working dumps produced by mariadb-dump ( MDEV-35286 , the bug was filed for GIS, vector has the same problem); distance functions on vectors of different length return NULL without warnings ( MDEV-35192 , an error which is easy to make in an application and which can cause unexpected results as the search will become fully random); XA involving vectors may cause replication errors ( MDEV-35271 , MDEV-35196 ); myisampack should be avoided for now ( MDEV-35198 ); DROP TABLE on a table with vector key should be performed carefully, either preventing possible failures or doing cleanup afterwards, as it is not atomic, can lead to corruption-like errors ( MDEV-35241 ); for ALTER TABLE on tables with vector key, better to use explicit ALGORITHM=COPY , as a non-copying algorithm may otherwise be chosen by default and cause issues ( MDEV-35292 , MDEV-35338 ); for tables with vector keys, engines other than InnoDB, Aria, and MyISAM should better be avoided for now even if they seem to accept table creation (Spider and Mroonga are known to have issues, most of other engines will be rejected right away); when vectors are involved, mariadb-dump should be run with --hex-dump option, otherwise the data can be lost ( MDEV-35221 ) while experimenting with mhnsw_ef_search at runtime, make sure that query cache is disabled, otherwise there will be no expected effect ( MDEV-35185 )

People

Assignee:: Elena Stepanova

Reporter:: Sergei Golubchik

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2024-09-16 12:45

Updated:: 2025-02-02 15:55

Resolved:: 2024-11-12 12:48

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server