Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33508

Performance regression due to frequent scan of full buf_pool.flush_list

    XMLWordPrintable

Details

    Description

      steve.shaw@intel.com noticed a significant performance regression on a single CPU socket server between 10.11.6 and 10.11.7. An initial suspect was MDEV-33053, but it turns out that a piece of code that had been added in a fix of MDEV-32029 is to blame. Removing that code would fix the regression:

      diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
      index 80b83f6a68f..fd92756cd29 100644
      --- a/storage/innobase/buf/buf0flu.cc
      +++ b/storage/innobase/buf/buf0flu.cc
      @@ -2433,16 +2433,6 @@ static void buf_flush_page_cleaner()
           {
             buf_pool.page_cleaner_set_idle(false);
             buf_pool.n_flush_inc();
      -      /* Remove clean blocks from buf_pool.flush_list before the LRU scan. */
      -      for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p; )
      -      {
      -        const lsn_t lsn{p->oldest_modification()};
      -        ut_ad(lsn > 2 || lsn == 1);
      -        buf_page_t *n= UT_LIST_GET_NEXT(list, p);
      -        if (lsn <= 1)
      -          buf_pool.delete_from_flush_list(p);
      -        p= n;
      -      }
             mysql_mutex_unlock(&buf_pool.flush_list_mutex);
             n= srv_max_io_capacity;
             mysql_mutex_lock(&buf_pool.mutex);
      

      This code would seem to be unnecessary for the actual MDEV-32029 fix. It was added because server freezes had been observed around the time the MDEV-32029 fix was tested. Indeed, without this code it was possible that buf_flush_LRU_list_batch() did not make any progress when all the blocks that it traversed in the buf_pool.LRU list had oldest_modification()==1, that is, the blocks are actually clean and should be removed from buf_pool.flush_list.

      It looks like the above code removal should have been part of MDEV-32588, which changed buf_flush_LRU_list_batch() so that it will try harder to remove such clean pages.

      After the fix of MDEV-33053, the page cleaner thread would keep working if buf_pool.need_LRU_eviction() holds. This is what made the redundant loop more noticeable.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.