[MDEV-35032] streaming mode for mhnsw search - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 11.7.1
Component/s: Vector search
Labels:
None

Description

Switch from "return k nearest" to "keep returning nodes in the increasing distance order as long as needed". This would work much better with WHERE clause.

Attachments

Issue Links

relates to

MDEV-33409 Index Condition Pushdown for k-ANN graph searches

Open

MDEV-34774 Vector queries crash in subquery or wrong results with WHERE

Closed

Activity

Ascending order - Click to sort in descending order

Sergei Golubchik added a comment - 2024-09-27 14:25 - edited

switch from returning a list of trefs to a "Search context"
also include a ctx, with ref counter increased (but without holding commit_lock — so InnoDB can modify the graph between calls)

Sergei Golubchik added a comment - 2024-09-27 14:25 - edited switch from returning a list of trefs to a "Search context" also include a ctx, with ref counter increased (but without holding commit_lock — so InnoDB can modify the graph between calls)

Sergei Golubchik added a comment - 2024-09-27 14:40

Two approaches:

put the bloom filter into the search context:
- after initial search, it's reset
- on following searches it's preserved — ensures the search walks away from the target
- if InnoDB modifies the graph:
  - detect that in read_next with a "version" counter
  - switch to trx context
  - restart the search from the beginning, skip as many found as needed to catch up (how to ensure there're no duplicates — check old bloom filter?)
do not preserve the bloom filter
- simply reject vectors that are closer to the target than the closest start node (= closest last found)
- or farthest prev last found
- it allows to switch seamlessly to the trx context without restarting the search
in all cases start nodes are results from the last iteration
the chunk size need to be tuned. may be ef is a good value?

Sergei Golubchik added a comment - 2024-09-27 14:40 Two approaches: put the bloom filter into the search context: after initial search, it's reset on following searches it's preserved — ensures the search walks away from the target if InnoDB modifies the graph: detect that in read_next with a "version" counter switch to trx context restart the search from the beginning, skip as many found as needed to catch up (how to ensure there're no duplicates — check old bloom filter?) do not preserve the bloom filter simply reject vectors that are closer to the target than the closest start node (= closest last found) or farthest prev last found it allows to switch seamlessly to the trx context without restarting the search in all cases start nodes are results from the last iteration the chunk size need to be tuned. may be ef is a good value?

Sergei Golubchik added a comment - 2024-09-27 21:22

second approach works with MyISAM. InnoDB needs work still

Sergei Golubchik added a comment - 2024-09-27 21:22 second approach works with MyISAM. InnoDB needs work still

People

Assignee:: Sergei Golubchik

Reporter:: Sergei Golubchik

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2024-09-27 14:22

Updated:: 2025-03-15 20:48

Resolved:: 2024-11-06 13:53

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server