Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33508

Performance regression due to frequent scan of full buf_pool.flush_list

Details

    Description

      steve.shaw@intel.com noticed a significant performance regression on a single CPU socket server between 10.11.6 and 10.11.7. An initial suspect was MDEV-33053, but it turns out that a piece of code that had been added in a fix of MDEV-32029 is to blame. Removing that code would fix the regression:

      diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
      index 80b83f6a68f..fd92756cd29 100644
      --- a/storage/innobase/buf/buf0flu.cc
      +++ b/storage/innobase/buf/buf0flu.cc
      @@ -2433,16 +2433,6 @@ static void buf_flush_page_cleaner()
           {
             buf_pool.page_cleaner_set_idle(false);
             buf_pool.n_flush_inc();
      -      /* Remove clean blocks from buf_pool.flush_list before the LRU scan. */
      -      for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p; )
      -      {
      -        const lsn_t lsn{p->oldest_modification()};
      -        ut_ad(lsn > 2 || lsn == 1);
      -        buf_page_t *n= UT_LIST_GET_NEXT(list, p);
      -        if (lsn <= 1)
      -          buf_pool.delete_from_flush_list(p);
      -        p= n;
      -      }
             mysql_mutex_unlock(&buf_pool.flush_list_mutex);
             n= srv_max_io_capacity;
             mysql_mutex_lock(&buf_pool.mutex);
      

      This code would seem to be unnecessary for the actual MDEV-32029 fix. It was added because server freezes had been observed around the time the MDEV-32029 fix was tested. Indeed, without this code it was possible that buf_flush_LRU_list_batch() did not make any progress when all the blocks that it traversed in the buf_pool.LRU list had oldest_modification()==1, that is, the blocks are actually clean and should be removed from buf_pool.flush_list.

      It looks like the above code removal should have been part of MDEV-32588, which changed buf_flush_LRU_list_batch() so that it will try harder to remove such clean pages.

      After the fix of MDEV-33053, the page cleaner thread would keep working if buf_pool.need_LRU_eviction() holds. This is what made the redundant loop more noticeable.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -

            I think that it is correct to claim that regression this is also caused by MDEV-33053, because that change enables buf_flush_page_cleaner() to run continuously when the buffer pool is about to run out.

            marko Marko Mäkelä added a comment - I think that it is correct to claim that regression this is also caused by MDEV-33053 , because that change enables buf_flush_page_cleaner() to run continuously when the buffer pool is about to run out.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Debarun Banerjee [ JIRAUSER54513 ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            Thanks Marko. I 100% agree and am happy that my review comment was actually pointing to the right issue.

            debarun Debarun Banerjee added a comment - Thanks Marko. I 100% agree and am happy that my review comment was actually pointing to the right issue.
            debarun Debarun Banerjee made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            Status Stalled [ 10000 ] In Testing [ 10301 ]
            marko Marko Mäkelä made changes -
            Assignee Debarun Banerjee [ JIRAUSER54513 ] Matthias Leich [ mleich ]
            marko Marko Mäkelä made changes -
            ralf.gebhardt Ralf Gebhardt made changes -

            origin/10.6-MDEV-33508 208f3ee34381f7c8a79d114f610c42bb53a5f394 2024-02-21T11:35:06+02:00
            behaved as good as the official 10.6 during RQG testing.

            mleich Matthias Leich added a comment - origin/10.6- MDEV-33508 208f3ee34381f7c8a79d114f610c42bb53a5f394 2024-02-21T11:35:06+02:00 behaved as good as the official 10.6 during RQG testing.
            mleich Matthias Leich made changes -
            Assignee Matthias Leich [ mleich ] Marko Mäkelä [ marko ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2024-02-28 11:48:07.0 2024-02-28 11:48:07.332
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.18 [ 29627 ]
            Fix Version/s 10.11.8 [ 29630 ]
            Fix Version/s 11.0.6 [ 29628 ]
            Fix Version/s 11.1.5 [ 29629 ]
            Fix Version/s 11.2.4 [ 29631 ]
            Fix Version/s 11.4.2 [ 29633 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.4 [ 29301 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            marko Marko Mäkelä added a comment - - edited

            Some prerequisites for hitting this anomaly are:

            • The system is running out of buffer pool
            • buf_pool.flush_list is large

            The buf_pool.flush_list size is reflected by the status variable Innodb_buffer_pool_pages_dirty, and the buf_pool.free size (number of available pages) is reflected by Innodb_buffer_pool_pages_free. There is also the buf_pool.LRU (Innodb_buffer_pool_pages_data) that reflects the number of data pages in the buffer pool.

            These can be queried for instance as follows:

            SELECT variable_name,variable_value FROM information_schema.global_status
            WHERE variable_name LIKE 'Innodb_buffer_pool_pages%';
            

            If Innodb_buffer_pool_pages_free is larger than innodb_lru_scan_depth / 2, no traversal of the entire buf_pool.flush_list should take place.

            Configuring a larger innodb_buffer_pool_size might work around this problem, if the workload fits in the buffer pool. Normally it should be set to 60 to 80 per cent of the available memory.

            If innodb_max_dirty_pages_pct_lwm is set to a nonzero value, or if innodb_max_dirty_page_pct is set to a lower value than the default 90%, the buf_pool.flush_list should remain short. The reason why we allow up to 90% of the buffer pool to be dirty is that it helps avoid write amplification: the same pages can be modified over and over again in the buffer pool, without having to be written back to the data files every time. Only the write to the write-ahead log (ib_logfile0) is really mandatory for durability.

            marko Marko Mäkelä added a comment - - edited Some prerequisites for hitting this anomaly are: The system is running out of buffer pool buf_pool.flush_list is large The buf_pool.flush_list size is reflected by the status variable Innodb_buffer_pool_pages_dirty , and the buf_pool.free size (number of available pages) is reflected by Innodb_buffer_pool_pages_free . There is also the buf_pool.LRU ( Innodb_buffer_pool_pages_data ) that reflects the number of data pages in the buffer pool. These can be queried for instance as follows: SELECT variable_name,variable_value FROM information_schema.global_status WHERE variable_name LIKE 'Innodb_buffer_pool_pages%' ; If Innodb_buffer_pool_pages_free is larger than innodb_lru_scan_depth / 2 , no traversal of the entire buf_pool.flush_list should take place. Configuring a larger innodb_buffer_pool_size might work around this problem, if the workload fits in the buffer pool. Normally it should be set to 60 to 80 per cent of the available memory. If innodb_max_dirty_pages_pct_lwm is set to a nonzero value, or if innodb_max_dirty_page_pct is set to a lower value than the default 90%, the buf_pool.flush_list should remain short. The reason why we allow up to 90% of the buffer pool to be dirty is that it helps avoid write amplification: the same pages can be modified over and over again in the buffer pool, without having to be written back to the data files every time. Only the write to the write-ahead log ( ib_logfile0 ) is really mandatory for durability.

            A good indicator for "running out of buffer pool" is that the Innodb_buffer_pool_wait_free status variable is increasing.

            marko Marko Mäkelä added a comment - A good indicator for "running out of buffer pool" is that the Innodb_buffer_pool_wait_free status variable is increasing.
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201921
            Zendesk active tickets 201921
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201921 201921 202654
            Zendesk active tickets 201921 201921 202654
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk active tickets 201921 202654 202654
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk active tickets 202654 202654 201921

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.