[MDEV-33508] Performance regression due to frequent scan of full buf_pool.flush_list - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6.16, 10.11.6, 10.6.17, 10.11.7, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4
Fix Version/s: 10.11.8, 10.6.18, 11.0.6, 11.1.5, 11.2.4, 11.4.2
Component/s: Storage Engine - InnoDB
Labels:
- performance
- regression

Description

steve.shaw@intel.com noticed a significant performance regression on a single CPU socket server between 10.11.6 and 10.11.7. An initial suspect was ~~MDEV-33053~~, but it turns out that a piece of code that had been added in a fix of ~~MDEV-32029~~ is to blame. Removing that code would fix the regression:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc

index 80b83f6a68f..fd92756cd29 100644

--- a/storage/innobase/buf/buf0flu.cc

+++ b/storage/innobase/buf/buf0flu.cc

@@ -2433,16 +2433,6 @@ static void buf_flush_page_cleaner()

       buf_pool.page_cleaner_set_idle(false);

       buf_pool.n_flush_inc();

-      /* Remove clean blocks from buf_pool.flush_list before the LRU scan. */

-      for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p; )

-      {

-        const lsn_t lsn{p->oldest_modification()};

-        ut_ad(lsn > 2 || lsn == 1);

-        buf_page_t *n= UT_LIST_GET_NEXT(list, p);

-        if (lsn <= 1)

-          buf_pool.delete_from_flush_list(p);

-        p= n;

-      }

       mysql_mutex_unlock(&buf_pool.flush_list_mutex);

       n= srv_max_io_capacity;

       mysql_mutex_lock(&buf_pool.mutex);

This code would seem to be unnecessary for the actual ~~MDEV-32029~~ fix. It was added because server freezes had been observed around the time the ~~MDEV-32029~~ fix was tested. Indeed, without this code it was possible that buf_flush_LRU_list_batch() did not make any progress when all the blocks that it traversed in the buf_pool.LRU list had oldest_modification()==1, that is, the blocks are actually clean and should be removed from buf_pool.flush_list.

It looks like the above code removal should have been part of ~~MDEV-32588~~, which changed buf_flush_LRU_list_batch() so that it will try harder to remove such clean pages.

After the fix of ~~MDEV-33053~~, the page cleaner thread would keep working if buf_pool.need_LRU_eviction() holds. This is what made the redundant loop more noticeable.

Attachments

Issue Links

is caused by

MDEV-32029 Assertion failures in log_sort_flush_list upon crash recovery

Closed

MDEV-33053 InnoDB LRU flushing does not run before running out of buffer pool

Closed

relates to

MDEV-32588 InnoDB may hang when running out of buffer pool

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2024-02-21 09:55

I think that it is correct to claim that regression this is also caused by ~~MDEV-33053~~, because that change enables buf_flush_page_cleaner() to run continuously when the buffer pool is about to run out.

Marko Mäkelä added a comment - 2024-02-21 09:55 I think that it is correct to claim that regression this is also caused by MDEV-33053 , because that change enables buf_flush_page_cleaner() to run continuously when the buffer pool is about to run out.

Debarun Banerjee added a comment - 2024-02-21 10:00

Thanks Marko. I 100% agree and am happy that my review comment was actually pointing to the right issue.

Debarun Banerjee added a comment - 2024-02-21 10:00 Thanks Marko. I 100% agree and am happy that my review comment was actually pointing to the right issue.

Matthias Leich added a comment - 2024-02-28 10:43

origin/10.6-~~MDEV-33508~~ 208f3ee34381f7c8a79d114f610c42bb53a5f394 2024-02-21T11:35:06+02:00
behaved as good as the official 10.6 during RQG testing.

Matthias Leich added a comment - 2024-02-28 10:43 origin/10.6- MDEV-33508 208f3ee34381f7c8a79d114f610c42bb53a5f394 2024-02-21T11:35:06+02:00 behaved as good as the official 10.6 during RQG testing.

Marko Mäkelä added a comment - 2024-06-27 08:45 - edited

Some prerequisites for hitting this anomaly are:

The system is running out of buffer pool
buf_pool.flush_list is large

The buf_pool.flush_list size is reflected by the status variable Innodb_buffer_pool_pages_dirty, and the buf_pool.free size (number of available pages) is reflected by Innodb_buffer_pool_pages_free. There is also the buf_pool.LRU (Innodb_buffer_pool_pages_data) that reflects the number of data pages in the buffer pool.

These can be queried for instance as follows:

SELECT variable_name,variable_value FROM information_schema.global_status

WHERE variable_name LIKE 'Innodb_buffer_pool_pages%';

If Innodb_buffer_pool_pages_free is larger than innodb_lru_scan_depth / 2, no traversal of the entire buf_pool.flush_list should take place.

Configuring a larger innodb_buffer_pool_size might work around this problem, if the workload fits in the buffer pool. Normally it should be set to 60 to 80 per cent of the available memory.

If innodb_max_dirty_pages_pct_lwm is set to a nonzero value, or if innodb_max_dirty_page_pct is set to a lower value than the default 90%, the buf_pool.flush_list should remain short. The reason why we allow up to 90% of the buffer pool to be dirty is that it helps avoid write amplification: the same pages can be modified over and over again in the buffer pool, without having to be written back to the data files every time. Only the write to the write-ahead log (ib_logfile0) is really mandatory for durability.

Marko Mäkelä added a comment - 2024-06-27 08:45 - edited Some prerequisites for hitting this anomaly are: The system is running out of buffer pool buf_pool.flush_list is large The buf_pool.flush_list size is reflected by the status variable Innodb_buffer_pool_pages_dirty , and the buf_pool.free size (number of available pages) is reflected by Innodb_buffer_pool_pages_free . There is also the buf_pool.LRU ( Innodb_buffer_pool_pages_data ) that reflects the number of data pages in the buffer pool. These can be queried for instance as follows: SELECT variable_name,variable_value FROM information_schema.global_status WHERE variable_name LIKE 'Innodb_buffer_pool_pages%' ; If Innodb_buffer_pool_pages_free is larger than innodb_lru_scan_depth / 2 , no traversal of the entire buf_pool.flush_list should take place. Configuring a larger innodb_buffer_pool_size might work around this problem, if the workload fits in the buffer pool. Normally it should be set to 60 to 80 per cent of the available memory. If innodb_max_dirty_pages_pct_lwm is set to a nonzero value, or if innodb_max_dirty_page_pct is set to a lower value than the default 90%, the buf_pool.flush_list should remain short. The reason why we allow up to 90% of the buffer pool to be dirty is that it helps avoid write amplification: the same pages can be modified over and over again in the buffer pool, without having to be written back to the data files every time. Only the write to the write-ahead log ( ib_logfile0 ) is really mandatory for durability.

Marko Mäkelä added a comment - 2024-06-27 10:44

A good indicator for "running out of buffer pool" is that the Innodb_buffer_pool_wait_free status variable is increasing.

Marko Mäkelä added a comment - 2024-06-27 10:44 A good indicator for "running out of buffer pool" is that the Innodb_buffer_pool_wait_free status variable is increasing.

MariaDB Server

Performance regression due to frequent scan of full buf_pool.flush_list

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration