Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39600

buf_flush_ahead() for async flushing causes contention on buf_pool.flush_list_mutex

    XMLWordPrintable

Details

    Description

      In buf_flush_ahead(), for the async case, the mutex buf_pool.flush_list_mutex is acquired for updating the atomic lsn target and waking up an indefinitely-waiting page flusher thread.

      This creates contention around the said mutex when the lsn age is above the async flushing threshold, and many threads are concurring to notify this to the page-flusher thread.

      Since the lsn target buf_flush_async_lsn is atomic, CAS-looping could be used to monotonically update the maximum, then having only CAS-loop winners lock-and-signal, if necessary. This requires also making the "idle-bit" of buf_pool.page_cleaner_status an Atomic_relaxed<bool> variable. Some touchups might be required as well in the waiting-branch in buf_flush_page_cleaner().

      I'm already working at a draft. Hopefully, it shall make notification to proceed with async-flushing less contended.

      Testing highlighted high% of buf_flush_ahead() waits/spinloops in both MDEV-39341 at high-VU count (>88) and in MDEV-37924 128G/64G UAW tests.

      Attachments

        Issue Links

          Activity

            People

              alessandro.vetere Alessandro Vetere
              alessandro.vetere Alessandro Vetere
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.