Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23754

Replace buf_pool.flush_list with a priority queue




      In an effort to speed up crash recovery, InnoDB introduced buf_pool_t::flush_rbt, which would keep the flush_list sorted by oldest_modification during recovery. MDEV-23399 will remove the buf_pool.flush_rbt to simplify the page flushing code.

      Such ordering should be beneficial at all times, but maintaining 2 data collections (doubly linked list and binary search tree) is clearly suboptimal. Replacing the doubly-linked buf_pool.flush_list with a priority queue ORDER BY page.oldest_modification ASC, page.id ASC should include the following benefits:

      1. guarantee linear progress of the page cleaner (advancing the MIN(oldest_modification))
      2. remove the need of FlushHp (unlike a linked list, a priority queue in something like an array allows the mutex to be released and reacquired in the middle of an iteration)
      3. remove all page lookups from buf_flush_try_neighbors() because the buf_pool.flush_list is already ORDER BY page.oldest_modification ASC, page.id ASC (this will not affect SSD thanks to MDEV-17380)
      4. remove the need to flush the buffer pool after recovery (recv_sys.debug_free()), and thus remove one blocker of MDEV-14481

      One implementation challenge is that apart from the buf_block_t that are in buf_pool.chunks for ROW_FORMAT=COMPRESSED pages we have some buf_page_t that are allocated with malloc(). The majority of buf_page_t are actually buf_block_t::page. Because of this, the priority queue will probably have to store pointers to buf_page_t.


        Issue Links



              marko Marko Mäkelä
              marko Marko Mäkelä
              0 Vote for this issue
              2 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.