Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18724

Replace buf_block_t::mutex with more std::atomic

Details

    Description

      InnoDB uses a combination of buffer-fixing and I/O fixing for buffer pool blocks. The buf_page_t::buf_fix_count is used in addition to buf_block_t::lock for user threads that are accessing pages in the buffer pool. The buf_page_t::io_fix was originally used for I/O operations, pinning the block during a read or write.

      In MySQL 5.6.4, the fix of Bug#11759044 - 51325: DROPPING AN EMPTY INNODB TABLE TAKES A LONG TIME WITH LARGE BUFFER POOL introduced another pseudo-I/O-fix state BUF_IO_PIN. This state seems to be redundant; we could increment and decrement buf_fix_count instead.

      Since MariaDB 10.2, the buf_page_t::buf_fix_count is always protected by a combination of atomic memory operations and buf_pool->mutex, while buf_page_t::io_fix uses a combination of buf_pool->mutex and the block mutex.

      If we can solely use buf_fix_count instead of io_fix = BUF_IO_PIN, then we could remove a few operations on the block mutex. Furthermore, if we increment or decrement buf_fix_count synchronized to setting or unsetting io_fix, the function buf_page_can_relocate() could be simplified to an atomic read of buf_fix_count, and we could invoke it without holding the block mutex.

      One source of flush_list relocation is buf_flush_relocate_on_flush_list(). It does not seem to be a problem. Before buf_page_get_gen() is allocating an uncompressed page for a compressed-only ROW_FORMAT=COMPRESSED block, it is checking that nobody else has buffer-fixed the block. Other calls are guarded by buf_page_can_relocate().

      inaamrana, do you remember why we replaced a combination of BUF_IO_READ and buf_fix_count with the BUF_IO_PIN state? As far as I understand, the purpose was to prevent the block from being moved or added or removed on the flush_list.

      It seems that while buf_flush_or_remove_pages() invokes buf_flush_try_yield(), the expectation is that buf_page_io_complete() (and buf_flush_write_complete()) cannot be invoked for the block. What actually guarantees this?

      It looks like buf_LRU_flush_or_remove_pages() is covered by exclusive MDL, so there cannot be multiple concurrent calls with the same tablespace ID. The only potential race would seem to be with buf_page_io_complete() or possibly with FLUSH TABLES…FOR EXPORT. I assume that MDL prevents FLUSH TABLES…FOR EXPORT from executing concurrently with any DDL, but maybe concurrent executions of multiple FLUSH TABLES…FOR EXPORT for the same table are allowed. (This needs to be tested.)

      Attachments

        Issue Links

          Activity

            It might be worthwhile to remove the 32-byte buf_block_t::mutex altogether, and use atomic operations on some members of buf_page_t and buf_block_t instead. For this to work flawlessly, we might want to fuse io_fix to buf_fix_count, and remove some of buf_page_t::state.

            marko Marko Mäkelä added a comment - It might be worthwhile to remove the 32-byte buf_block_t::mutex altogether, and use atomic operations on some members of buf_page_t and buf_block_t instead. For this to work flawlessly, we might want to fuse io_fix to buf_fix_count , and remove some of buf_page_t::state .

            The following buf_block_t members are currently protected by buf_block_t::mutex:

            • page.state
            • page.io_fix
            • file_page_was_freed (should be protected by lock instead, and made non-debug for MDEV-12226 and MDEV-15528)
            • page.zip; maybe we could protect this by buf_pool->mutex only?
            • page.flush_type, page.write_size, page.encrypted, page.real_size, page.slot: can we move these to IORequest, or rename these?
            • newest_modification, oldest_modification, flush_observer: Should probably be protected by lock instead.
            • access_time: Should probably be protected by std::atomic.
            marko Marko Mäkelä added a comment - The following buf_block_t members are currently protected by buf_block_t::mutex : page.state page.io_fix file_page_was_freed (should be protected by lock instead, and made non-debug for MDEV-12226 and MDEV-15528 ) page.zip ; maybe we could protect this by buf_pool->mutex only? page.flush_type , page.write_size , page.encrypted , page.real_size , page.slot : can we move these to IORequest , or rename these? newest_modification , oldest_modification , flush_observer : Should probably be protected by lock instead. access_time : Should probably be protected by std::atomic .

            MDEV-15528,MDEV-16526,MDEV-19514 could refactor some related data structures or code, making this task easier.

            I rebased the current code to 10.5. It currently suffers from some race conditions, causing random crashes in a few larger mysql-test-run tests.

            marko Marko Mäkelä added a comment - MDEV-15528 , MDEV-16526 , MDEV-19514 could refactor some related data structures or code, making this task easier. I rebased the current code to 10.5 . It currently suffers from some race conditions, causing random crashes in a few larger mysql-test-run tests.

            I ended up removing buf_block_t::mutex as part of MDEV-15053. It does not look like buf_page_t::io_fix() could be removed easily.

            marko Marko Mäkelä added a comment - I ended up removing buf_block_t::mutex as part of MDEV-15053 . It does not look like buf_page_t::io_fix() could be removed easily.

            This change was made as part of MDEV-15053.

            marko Marko Mäkelä added a comment - This change was made as part of MDEV-15053 .

            Some more cleanup was performed in MDEV-27058, where the io-fix and buffer-fix along with some other state was merged into a single atomic 32-bit word.

            marko Marko Mäkelä added a comment - Some more cleanup was performed in MDEV-27058 , where the io-fix and buffer-fix along with some other state was merged into a single atomic 32-bit word.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.