[MDEV-32757] innodb_undo_log_truncate=ON is not crash safe Created: 2023-11-10  Updated: 2023-12-19  Resolved: 2023-11-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: recovery, regression

Issue Links:
Blocks
Problem/Incident
is caused by MDEV-26445 innodb_undo_log_truncate is unnecessa... Closed
Relates
relates to MDEV-33009 Server hangs for a long time with inn... Closed
relates to MDEV-32681 Test case innodb.undo_truncate_recove... Closed

 Description   

As noted in MDEV-32681, the invocation of buf_page_t::clear_oldest_modification() in trx_purge_truncate_history() is unsafe. The buf_flush_page_cleaner() thread may advance the checkpoint before mtr_t::commit_shrink() has finished executing. This may cause a previously committed transaction to be treated as uncommitted. The server could crash or hang while trying to roll back the "uncommitted" transaction. With the data.tar.xz that is attached to MDEV-32681, the rollback would try to free an undo log page multiple times and possibly end up in an infinite loop.



 Comments   
Comment by Matthias Leich [ 2023-11-14 ]

origin/10.6-MDEV-32757 5a8b6ff7d9805403ed8c7bb9025e9e8b85eccfcf 2023-11-10T12:27:02+02:00
behaved well in RQG testing

Comment by Matthias Leich [ 2023-11-15 ]

origin/10.5-MDEV-32757 64601d745903bb3b0487d294cece2958eb312f28 2023-11-10T11:54:20+02:00
behaved well in RQG testing

Comment by Marko Mäkelä [ 2023-12-13 ]

Anyone who hits this issue would likely have to start up the server with innodb_force_recovery=3 and create a logical dump of the database, effectively using the READ UNCOMMITTED isolation level. I do not think there is any easy way to "repair" the InnoDB transactional metadata. MDEV-19229 implements something close to that, but its prerequisite is that the undo logs are logically empty, which would not be the case here.

Generated at Thu Feb 08 10:33:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.