[MDEV-33053] InnoDB LRU flushing does not run before running out of buffer pool Created: 2023-12-18 Updated: 2024-01-26 Resolved: 2024-01-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3 |
| Fix Version/s: | 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2, 11.4.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | performance | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
A customer who uses remarkably large innodb_buffer_pool_size noted that there are some performance stalls when the buffer pool is running out of free pages. In a scenario where background flushing is not enabled (innodb_max_dirty_pages_pct_lwm is not being exceeded in the buffer pool, or it is 0 and innodb_max_dirty_pages_pct is not being exceeded), the buf_flush_page_cleaner() thread could sit idle while pages in the buffer pool are slowly running out. It could make sense to wake up the buf_flush_page_cleaner() thread in buf_LRU_get_free_block() when buf_pool.page_cleaner_idle() holds and there are fewer than innodb_lru_scan_depth blocks available. If there are fewer than innodb_lru_scan_depth/2 blocks available in the buffer pool and the page cleaner is not idle but being invoked once per second, it would make sense to wake up the page cleaner from its sleep immediately. LRU eviction flushing should ignore the innodb_io_capacity throttling, because running out of buffer pool will risk blocking all operations in the database. |
| Comments |
| Comment by Marko Mäkelä [ 2023-12-19 ] |
|
My fix would depend on some changes that were made in |
| Comment by Marko Mäkelä [ 2024-01-09 ] |
|
debarun, you mentioned related to MDEV-28800 that you observed something that I think could match this description. Can you take a look? Can you reproduce the described scenario? My attempt at blindly fixing this did not significantly improve the situation in the customer’s environment. |
| Comment by Debarun Banerjee [ 2024-01-10 ] |
|
marko Can we try to get following information from customer ? It would help analysing the scenario better. 1. Select * from innodb_buffer_pool_stats \G
2. Configuration parameters if different from default: 3. The TPS/QPS readings over time around the stall 4. cpu and iostat on the disk during stall 5. DB size on disk |
| Comment by Debarun Banerjee [ 2024-01-12 ] |
|
I could analyze the issue from the support ticket. The detailed information have been very helpful. Along with flushing appropriate number of pages page cleaner thread has the responsibility of maintaining enough number of free pages The earlier page cleaner improvements have shifted the balance in such a way that the LRU flushing and eviction can be skipped for long until the The issue could be repeated in local and I have created the initial patch. We would need to have the review discussions and tests. I have a tentative patch. |
| Comment by Marko Mäkelä [ 2024-01-15 ] |
|
debarun, thank you for the ideas and discussion. I revised (or replaced) my initial fix https://github.com/MariaDB/server/pull/2949/ with some ideas from you, some from me. One part that we might still want to revise or make controllable by a debug parameter similar to innodb_flush_sync is that when the number of free pages is below innodb_lru_scan_depth, the page cleaner thread will keep running continuously. Normally, the innodb_io_capacity and innodb_io_capacity_max specify the number of pages to be written per a one-second iteration of the page cleaner. When the page cleaner runs in continuous or ‘furious’ mode, several iterations can be run per second. |
| Comment by Debarun Banerjee [ 2024-01-18 ] |
|
marko Thanks for the changes. Looks good to me. |
| Comment by Matthias Leich [ 2024-01-19 ] |
|
origin/10.6- |