[MDEV-33414] benchmark vector indexes Created: 2024-02-07  Updated: 2024-02-07

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Sergei Golubchik Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-33413 cache k-ANN graph in memory Open
blocks MDEV-33415 graph index search: heuristical edge ... Open
blocks MDEV-33416 graph index: use smaller floating poi... Open
blocks MDEV-33418 graph index insert: stronger selectio... Open
blocks MDEV-33419 graph index insert: consider more nei... Open
is blocked by MDEV-33406 basic optimizer support for k-NN sear... In Progress
is blocked by MDEV-33407 Parser support for vector indexes In Progress
is blocked by MDEV-33408 HNSW for k-ANN vector searches Open
Relates
relates to MDEV-33404 Engine-independent indexes: subtable ... In Progress
relates to MDEV-33405 Engine-independent indexes: low-level... Open
relates to MDEV-32887 k-ANN indexes for vectors In Progress

 Description   

Use some real-life million-size dataset

Benchmark goals:

  • Index creation time
  • Index update time
  • Index lookup for 1, 10, 100, 1000 entries for the same "query" vector repeated N times.
    • This tests speed of graph lookup with hopefully most data in cache.
  • Index lookup for million different query vectors (top 1, 10, 100, 1000) results, no repetition.
    • This tests speed of graph lookup when data may not all fit in cache.
  • Compute average recall of the algorithm for such queries.

Compare to papers using the same algorithm.


Generated at Thu Feb 08 10:38:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.