Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33508

Performance regression due to frequent scan of full buf_pool.flush_list

Details

    Description

      steve.shaw@intel.com noticed a significant performance regression on a single CPU socket server between 10.11.6 and 10.11.7. An initial suspect was MDEV-33053, but it turns out that a piece of code that had been added in a fix of MDEV-32029 is to blame. Removing that code would fix the regression:

      diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
      index 80b83f6a68f..fd92756cd29 100644
      --- a/storage/innobase/buf/buf0flu.cc
      +++ b/storage/innobase/buf/buf0flu.cc
      @@ -2433,16 +2433,6 @@ static void buf_flush_page_cleaner()
           {
             buf_pool.page_cleaner_set_idle(false);
             buf_pool.n_flush_inc();
      -      /* Remove clean blocks from buf_pool.flush_list before the LRU scan. */
      -      for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p; )
      -      {
      -        const lsn_t lsn{p->oldest_modification()};
      -        ut_ad(lsn > 2 || lsn == 1);
      -        buf_page_t *n= UT_LIST_GET_NEXT(list, p);
      -        if (lsn <= 1)
      -          buf_pool.delete_from_flush_list(p);
      -        p= n;
      -      }
             mysql_mutex_unlock(&buf_pool.flush_list_mutex);
             n= srv_max_io_capacity;
             mysql_mutex_lock(&buf_pool.mutex);
      

      This code would seem to be unnecessary for the actual MDEV-32029 fix. It was added because server freezes had been observed around the time the MDEV-32029 fix was tested. Indeed, without this code it was possible that buf_flush_LRU_list_batch() did not make any progress when all the blocks that it traversed in the buf_pool.LRU list had oldest_modification()==1, that is, the blocks are actually clean and should be removed from buf_pool.flush_list.

      It looks like the above code removal should have been part of MDEV-32588, which changed buf_flush_LRU_list_batch() so that it will try harder to remove such clean pages.

      After the fix of MDEV-33053, the page cleaner thread would keep working if buf_pool.need_LRU_eviction() holds. This is what made the redundant loop more noticeable.

      Attachments

        Issue Links

          Activity

            I think that it is correct to claim that regression this is also caused by MDEV-33053, because that change enables buf_flush_page_cleaner() to run continuously when the buffer pool is about to run out.

            marko Marko Mäkelä added a comment - I think that it is correct to claim that regression this is also caused by MDEV-33053 , because that change enables buf_flush_page_cleaner() to run continuously when the buffer pool is about to run out.

            Thanks Marko. I 100% agree and am happy that my review comment was actually pointing to the right issue.

            debarun Debarun Banerjee added a comment - Thanks Marko. I 100% agree and am happy that my review comment was actually pointing to the right issue.

            origin/10.6-MDEV-33508 208f3ee34381f7c8a79d114f610c42bb53a5f394 2024-02-21T11:35:06+02:00
            behaved as good as the official 10.6 during RQG testing.

            mleich Matthias Leich added a comment - origin/10.6- MDEV-33508 208f3ee34381f7c8a79d114f610c42bb53a5f394 2024-02-21T11:35:06+02:00 behaved as good as the official 10.6 during RQG testing.
            marko Marko Mäkelä added a comment - - edited

            Some prerequisites for hitting this anomaly are:

            • The system is running out of buffer pool
            • buf_pool.flush_list is large

            The buf_pool.flush_list size is reflected by the status variable Innodb_buffer_pool_pages_dirty, and the buf_pool.free size (number of available pages) is reflected by Innodb_buffer_pool_pages_free. There is also the buf_pool.LRU (Innodb_buffer_pool_pages_data) that reflects the number of data pages in the buffer pool.

            These can be queried for instance as follows:

            SELECT variable_name,variable_value FROM information_schema.global_status
            WHERE variable_name LIKE 'Innodb_buffer_pool_pages%';
            

            If Innodb_buffer_pool_pages_free is larger than innodb_lru_scan_depth / 2, no traversal of the entire buf_pool.flush_list should take place.

            Configuring a larger innodb_buffer_pool_size might work around this problem, if the workload fits in the buffer pool. Normally it should be set to 60 to 80 per cent of the available memory.

            If innodb_max_dirty_pages_pct_lwm is set to a nonzero value, or if innodb_max_dirty_page_pct is set to a lower value than the default 90%, the buf_pool.flush_list should remain short. The reason why we allow up to 90% of the buffer pool to be dirty is that it helps avoid write amplification: the same pages can be modified over and over again in the buffer pool, without having to be written back to the data files every time. Only the write to the write-ahead log (ib_logfile0) is really mandatory for durability.

            marko Marko Mäkelä added a comment - - edited Some prerequisites for hitting this anomaly are: The system is running out of buffer pool buf_pool.flush_list is large The buf_pool.flush_list size is reflected by the status variable Innodb_buffer_pool_pages_dirty , and the buf_pool.free size (number of available pages) is reflected by Innodb_buffer_pool_pages_free . There is also the buf_pool.LRU ( Innodb_buffer_pool_pages_data ) that reflects the number of data pages in the buffer pool. These can be queried for instance as follows: SELECT variable_name,variable_value FROM information_schema.global_status WHERE variable_name LIKE 'Innodb_buffer_pool_pages%' ; If Innodb_buffer_pool_pages_free is larger than innodb_lru_scan_depth / 2 , no traversal of the entire buf_pool.flush_list should take place. Configuring a larger innodb_buffer_pool_size might work around this problem, if the workload fits in the buffer pool. Normally it should be set to 60 to 80 per cent of the available memory. If innodb_max_dirty_pages_pct_lwm is set to a nonzero value, or if innodb_max_dirty_page_pct is set to a lower value than the default 90%, the buf_pool.flush_list should remain short. The reason why we allow up to 90% of the buffer pool to be dirty is that it helps avoid write amplification: the same pages can be modified over and over again in the buffer pool, without having to be written back to the data files every time. Only the write to the write-ahead log ( ib_logfile0 ) is really mandatory for durability.

            A good indicator for "running out of buffer pool" is that the Innodb_buffer_pool_wait_free status variable is increasing.

            marko Marko Mäkelä added a comment - A good indicator for "running out of buffer pool" is that the Innodb_buffer_pool_wait_free status variable is increasing.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.