Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6, 10.11, 11.2(EOL), 11.4
Description
sanja noted that during the execution of the test innodb.innodb-32k-crash some debug assertions that had been added together with the fix of MDEV-31354 to the function log_sort_flush_list() are failing rather often, like this:
10.6 753e7d6d7ce7770d3c98beb6fdcb97e0e8d1ec9f |
innodb.innodb-32k-crash w18 [ fail ]
|
Test ended at 2024-10-01 10:31:25
|
…
|
2024-10-01 10:31:25 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1299959,3973825
|
2024-10-01 10:31:25 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 3 row operations to undo
|
2024-10-01 10:31:25 0 [Note] InnoDB: Trx id counter is 225
|
2024-10-01 10:31:25 0 [Note] InnoDB: To recover: 658 pages
|
mariadbd: /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/log/log0recv.cc:3658: log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)>: Assertion `l > 2' failed.
|
I was able to reproduce this. In the core dump that I analyzed, all 7 members of buf_pool.flush_list carried oldest_modification()==1, that is, the pages had been written back to the file system.
As noted in MDEV-31354, starting with MDEV-25113 it is possible that the buf_page_t::oldest_modification() will be updated to 1 by a thread that is not holding buf_pool.flush_list_mutex. The debug assertions on LSN being above 2 must be revised accordingly. As a slight optimization, when we are copying the sorted list back to buf_pool.flush_list, we can omit such blocks.
The test innodb.innodb-32k-crash also started to fail in another way in 10.6 but not later versions, due to a bogus debug assertion that was added to recv_recovery_from_checkpoint_start() in MDEV-34830:
ut_ad(log_sys.get_lsn() >= recv_sys.scanned_lsn);
|
This assertion may fail when the last mini-transaction in the log was not completely written. In that case, the recv_sys.scanned_lsn could be a few 512-byte blocks ahead of recv_sys.recovered_lsn, which is what matters. In MDEV-14425, these fields were replaced by recv_sys.lsn and there is no log block layer anymore; each mini-transaction is a logical log block on its own.
Attachments
Issue Links
- relates to
-
MDEV-35226 InnoDB occasionally fails to recover a corrupted page from the doublewrite buffer
- Stalled
-
MDEV-25113 Reduce effect of parallel background flush on select workload
- Closed
-
MDEV-31354 SIGSEGV in log_sort_flush_list() in InnoDB crash recovery
- Closed