Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18724

Replace buf_block_t::mutex with more std::atomic




      InnoDB uses a combination of buffer-fixing and I/O fixing for buffer pool blocks. The buf_page_t::buf_fix_count is used in addition to buf_block_t::lock for user threads that are accessing pages in the buffer pool. The buf_page_t::io_fix was originally used for I/O operations, pinning the block during a read or write.

      In MySQL 5.6.4, the fix of Bug#11759044 - 51325: DROPPING AN EMPTY INNODB TABLE TAKES A LONG TIME WITH LARGE BUFFER POOL introduced another pseudo-I/O-fix state BUF_IO_PIN. This state seems to be redundant; we could increment and decrement buf_fix_count instead.

      Since MariaDB 10.2, the buf_page_t::buf_fix_count is always protected by a combination of atomic memory operations and buf_pool->mutex, while buf_page_t::io_fix uses a combination of buf_pool->mutex and the block mutex.

      If we can solely use buf_fix_count instead of io_fix = BUF_IO_PIN, then we could remove a few operations on the block mutex. Furthermore, if we increment or decrement buf_fix_count synchronized to setting or unsetting io_fix, the function buf_page_can_relocate() could be simplified to an atomic read of buf_fix_count, and we could invoke it without holding the block mutex.

      One source of flush_list relocation is buf_flush_relocate_on_flush_list(). It does not seem to be a problem. Before buf_page_get_gen() is allocating an uncompressed page for a compressed-only ROW_FORMAT=COMPRESSED block, it is checking that nobody else has buffer-fixed the block. Other calls are guarded by buf_page_can_relocate().

      inaamrana, do you remember why we replaced a combination of BUF_IO_READ and buf_fix_count with the BUF_IO_PIN state? As far as I understand, the purpose was to prevent the block from being moved or added or removed on the flush_list.

      It seems that while buf_flush_or_remove_pages() invokes buf_flush_try_yield(), the expectation is that buf_page_io_complete() (and buf_flush_write_complete()) cannot be invoked for the block. What actually guarantees this?

      It looks like buf_LRU_flush_or_remove_pages() is covered by exclusive MDL, so there cannot be multiple concurrent calls with the same tablespace ID. The only potential race would seem to be with buf_page_io_complete() or possibly with FLUSH TABLES…FOR EXPORT. I assume that MDL prevents FLUSH TABLES…FOR EXPORT from executing concurrently with any DDL, but maybe concurrent executions of multiple FLUSH TABLES…FOR EXPORT for the same table are allowed. (This needs to be tested.)


        Issue Links



              marko Marko Mäkelä
              marko Marko Mäkelä
              0 Vote for this issue
              2 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.