[MDEV-27416] InnoDB hang in buf_flush_wait_flushed(), on log checkpoint Created: 2022-01-03 Updated: 2022-01-18 Resolved: 2022-01-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.8.0, 10.5.9, 10.6.0, 10.7.0, 10.5, 10.6, 10.7, 10.8 |
| Fix Version/s: | 10.5.14, 10.6.6, 10.7.2, 10.8.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hang | ||
| Issue Links: |
|
||||||||||||
| Description |
|
On our CI systems, on builders that run on real storage and not RAM disk, we see occasional failures of the IMPORT TABLESPACE tests, because a wait for a log checkpoint is hanging with a stack trace like this:
Actually, the checkpoint there should be unnecessary, but that is not the main point here.
The test invocation was:
I think that applying the first commit of |
| Comments |
| Comment by Marko Mäkelä [ 2022-01-03 ] | |||||||||||||
|
Unfortunately, the code cleanup did not fix this:
The core dump excludes the buffer pool, so I will have to revert | |||||||||||||
| Comment by Marko Mäkelä [ 2022-01-04 ] | |||||||||||||
|
The hang is prevented if we invoke buf_pool.page_cleaner_set_idle(false) before waking up the page cleaner. My test campaign of 100 rounds of 100 concurrent instances of the test innodb.innodb-wl5522 on a 104-core server has proceeded to the 93rd round without a hang, and I expect it to complete in a few minutes. Without this fix, one of the 100 concurrent instances used to hang on the 7th round. I suspect that this hang is only possible if the server is otherwise idle during the checkpoint request. If any other threads conducted any writes to persistent InnoDB tables, the 2 hung threads would be ‘rescued’ by the page cleaner thread being eventually woken up. | |||||||||||||
| Comment by Marko Mäkelä [ 2022-01-04 ] | |||||||||||||
|
I forgot that there were 2 combinations of the test, for two values of innodb_checksum_algorithm. Also the other combination was fine:
|