[MDEV-27461] Server hangs when the bufferpool is shrunk to 8M and innodb_page_size=64K Created: 2022-01-10 Updated: 2022-01-20 Resolved: 2022-01-18 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.8.0, 10.7.1 |
| Fix Version/s: | 10.5.14, 10.6.6, 10.7.2, 10.8.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Matthias Leich | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
|
| Comments |
| Comment by Daniel Black [ 2022-01-11 ] | ||||||||||||||||||||||
|
Thanks mleich! Technically reproducible on 10.2 with explicit minimum (1M) innodb-buffer-pool-chunk-size,
I don't quite get the assertion in the timeout thread, but if the above endless loop is prevented it becomes resolved. | ||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-11 ] | ||||||||||||||||||||||
|
I analyzed the hang yesterday. It was caused by buf_pool.mutex being continuously held by buf_LRU_free_page(), for buf_page_init_for_read() while buf_pool_t::resize() in another thread was desperately trying to complete. We had innodb_buffer_pool_size=8M and innodb_page_size=64k during that hang. What the the page read was issued for should not matter. I think that it was purge or rollback. | ||||||||||||||||||||||
| Comment by Daniel Black [ 2022-01-11 ] | ||||||||||||||||||||||
|
From marko: My preferred fix would be to avoid the hang when trying to allocate a page while resize is running. I did not investigate how easy that would be to do. | ||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-17 ] | ||||||||||||||||||||||
|
I think that the test failure demonstrates two problems. The main one is that the minimum buffer pool size is not being enforced consistently. I hope that you can fix that, danblack. To me, the more interesting one is that the server is hanging. I will try to determine the root cause of that, and see what it would take to fix that. | ||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-17 ] | ||||||||||||||||||||||
|
I had a look at the code. The function buf_LRU_get_free_block() already works in a rather reasonable fashion. It is releasing the buf_pool.mutex between iterations. The infamous message InnoDB: Difficult to find free blocks in the buffer pool is being suppressed when buffer pool resizing is active. Because buf_LRU_get_free_block() is not waiting for buffer pool resizing to finish and because buf_pool_t::resize() is only polling the status every 10 seconds, the infinite loop could be unavoidable. In MDEV-27461.txt
I do not think that either thread should be holding many buffer pool page latches. I will have to debug the trace again to see if anything else could be improved, apart from the trivial fix to enforce a reasonable minimum buffer pool size (measured in pages of innodb_page_size, not bytes). | ||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-17 ] | ||||||||||||||||||||||
|
I analyzed a new trace of this. In it, we have 229 pages in both buf_pool.LRU and buf_pool.flush_list, and at least 2 of those pages are really waiting to be written out to a data file. (After The page cleaner is in an untimed wait that was introduced in
I believe that a potential hang is possible in 10.5 or later even with larger buffer pool sizes, due to | ||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-17 ] | ||||||||||||||||||||||
|
For 10.5 and later, I pushed a fix to ensure that the page cleaner will start writing out pages so that some space will become available in the buffer pool. According to mleich, it helps to reduce the problem, but it will not prevent hangs when larger values of innodb_max_dirty_pages_pct are in use. With that fix (fixing a minor regression caused by Root cause: I do not know why the minimum sizes at startup are steeper than the hard limit (5MiB for up to innodb_page_size=16384, and 24MiB for larger page sizes), but it indeed would be consistent to enforce the same limit at startup and in SET GLOBAL. danblack, can you please fix the SET GLOBAL? | ||||||||||||||||||||||
| Comment by Matthias Leich [ 2022-01-17 ] | ||||||||||||||||||||||
|
| ||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-18 ] | ||||||||||||||||||||||
|
Consistent enforcement of the minimum value in SET GLOBAL innobd_buffer_pool_size will be covered by |