[MDEV-32887] k-ANN indexes for vectors Created: 2023-11-26  Updated: 2024-02-07

Status: In Progress
Project: MariaDB Server
Component/s: Server
Fix Version/s: 11.5

Type: New Feature Priority: Major
Reporter: Sergei Golubchik Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 0
Labels: vector

Issue Links:
Relates
relates to MDEV-32886 VEC_FromText() and VEC_AsText() funct... Open
relates to MDEV-33404 Engine-independent indexes: subtable ... In Progress
relates to MDEV-33405 Engine-independent indexes: low-level... Open
relates to MDEV-33406 basic optimizer support for k-NN sear... In Progress
relates to MDEV-33407 Parser support for vector indexes In Progress
relates to MDEV-33408 HNSW for k-ANN vector searches Open
relates to MDEV-33410 VECTOR data type Open
relates to MDEV-33411 OPTIMIZE for graph indexes Open
relates to MDEV-33413 cache k-ANN graph in memory Open
relates to MDEV-33414 benchmark vector indexes Open
relates to MDEV-33415 graph index search: heuristical edge ... Open
relates to MDEV-33417 vector distance: add more metrics Open
relates to MDEV-33419 graph index insert: consider more nei... Open
relates to MDEV-32885 VEC_DISTANCE() function In Testing
relates to MDEV-33409 icp for k-ANN graph searches Open
relates to MDEV-33412 cost-based optimizer choice for k-NN ... Open
relates to MDEV-33416 graph index: use smaller floating poi... Open
relates to MDEV-33418 graph index insert: stronger selectio... Open

 Description   

A common operation for multi-dimensional vectors is to find k nearest vectors to the given one.
This task is about implementing indexes that allow to do it fast.

  • ideally they'll be engine independent
  • indexes should be update-able
  • in this task we'll only do Euclidean distance
  • we'll benchmark it on real multi-million-rows data sets
  • what algorithm, exactly, to use is still unclear


 Comments   
Comment by Sergei Golubchik [ 2024-01-19 ]

parser, grammar. CREATE TABLE works,

Comment by Sergei Golubchik [ 2024-01-19 ]

metadata is stored in the frm, read from frm, type checks (only BLOB and NOT NULL)
SHOW commands and INFORMATION_SCHEMA.STATISTICS

Comment by Sergei Golubchik [ 2024-01-19 ]

An API for a graph algorithm.

Algorithm implementation needs to provide methods for

  • add a vector to the graph
  • remove a vector from a graph
  • update — only if there will be some algorithm that can do it much faster that delete+insert
  • build a graph from a set of vectors

The server will provide methods for

  • read/update list of edges for a specific node
Generated at Thu Feb 08 10:34:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.