[MDEV-18705] Parallel index range scan - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Storage Engine - InnoDB
Labels:
- parallel-query

Description

In a discussion with kaamos and arubin today, the idea of parallel range scan, especially in the clustered index, was brought up.

When a search is spanning multiple pages (we could determine this when following the chain of node pointers from the root page to the first leaf page of the range), we could submit other page numbers to a work queue. Then, based on system load, this work queue would be executed either by the current thread or by other threads.

The work queue could be a special high-priority queue that is managed by the ~~MDEV-16264~~ infrastructure. This would allow us to use either a single thread or multiple threads for executing a query, based on system load.

There are some challenges to this. The biggest one is: How to prevent the index tree from changing (pages being split or merged) between the time we have sampled the page numbers and we are actually accessing the pages? We would probably not want to increase page latch conflicts with other queries by holding latches on the upper pages for a longer time. Currently, InnoDB would release the upper-level page latches as soon as it reaches the first leaf page, and then does a simple forward scan of the leaf level. This avoids any conflicts with concurrent tree modifications.

One possibility is to distribute work based on key ranges rather than page numbers. We could get the key ranges from the node pointers pages. Each request from the work queue would then be handled by searching to the start of the range and then scanning forward until the end of the range is reached. In this way, the upper level page latches can be released immediately after the key ranges have been enqueued to the work queue.

With secondary index scans, in the worst case, each visit to a secondary index would require a lookup of the corresponding clustered index record as well as some traversal of history from undo log records. If we distribute the work based on secondary index key ranges, this should not cause any reduction of concurrency. There can be multiple concurrent readers on the index and undo log pages.

Attachments

Issue Links

relates to

MDEV-6096 Research - Ideas about parallel query execution

Open

MDEV-16264 Implement a common work queue for InnoDB background tasks

Closed

MDEV-21291 Support Parallel Query Execution

Closed

Activity

There are no comments yet on this issue.

People

Assignee:: Unassigned

Reporter:: Marko Mäkelä

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2019-02-23 19:13

Updated:: 2020-11-10 19:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server