[MDEV-33408] HNSW for k-ANN vector searches Created: 2024-02-07  Updated: 2024-02-07

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Sergei Golubchik Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-33411 OPTIMIZE for graph indexes Open
blocks MDEV-33414 benchmark vector indexes Open
blocks MDEV-33415 graph index search: heuristical edge ... Open
blocks MDEV-33416 graph index: use smaller floating poi... Open
blocks MDEV-33418 graph index insert: stronger selectio... Open
blocks MDEV-33419 graph index insert: consider more nei... Open
is blocked by MDEV-33404 Engine-independent indexes: subtable ... In Progress
Relates
relates to MDEV-32887 k-ANN indexes for vectors In Progress

 Description   

For the first iteration of Vector search, we will implement HNSW algorithm.

The implementation will only support Euclidean distance initially.

Basic plan:
Graph construction will be done according to HNSW paper.

Storage wise, we'll store the graph as part of a subtable (MDEV-33404).

The table's definition will be something along these lines:

  CREATE TABLE i (
    level int unsigned not null,
    src varbinary(255) not null,
    dst varbinary(255) not null,
    index (level,src),
    index (level,dst));

For each link in the graph, there will be a corresponding entry in the table.

  • src and dst will store handler::position, a quick link to the actual vector blob in the main table.

The index (level,src) will allow for quick jumping between nodes.
To go deeper in search, one just needs to decrement the level and search using the same "src" value.

If src is found on level n, then it is also found on level n - 1 and so on. Level 0 is the base level with all the nodes.

Performance considerations:

  • Storing the vector in the subtable might be required. Looking up the blob value in the base table might be too costly.

Generated at Thu Feb 08 10:38:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.