Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18705

Parallel index range scan

    XMLWordPrintable

    Details

      Description

      In a discussion with Alexey Kopytov and Alexander Rubin today, the idea of parallel range scan, especially in the clustered index, was brought up.

      When a search is spanning multiple pages (we could determine this when following the chain of node pointers from the root page to the first leaf page of the range), we could submit other page numbers to a work queue. Then, based on system load, this work queue would be executed either by the current thread or by other threads.

      The work queue could be a special high-priority queue that is managed by the MDEV-16264 infrastructure. This would allow us to use either a single thread or multiple threads for executing a query, based on system load.

      There are some challenges to this. The biggest one is: How to prevent the index tree from changing (pages being split or merged) between the time we have sampled the page numbers and we are actually accessing the pages? We would probably not want to increase page latch conflicts with other queries by holding latches on the upper pages for a longer time. Currently, InnoDB would release the upper-level page latches as soon as it reaches the first leaf page, and then does a simple forward scan of the leaf level. This avoids any conflicts with concurrent tree modifications.

      One possibility is to distribute work based on key ranges rather than page numbers. We could get the key ranges from the node pointers pages. Each request from the work queue would then be handled by searching to the start of the range and then scanning forward until the end of the range is reached. In this way, the upper level page latches can be released immediately after the key ranges have been enqueued to the work queue.

      With secondary index scans, in the worst case, each visit to a secondary index would require a lookup of the corresponding clustered index record as well as some traversal of history from undo log records. If we distribute the work based on secondary index key ranges, this should not cause any reduction of concurrency. There can be multiple concurrent readers on the index and undo log pages.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.