Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
11.0.5, 11.1.4, 11.2.3, 11.3.2, 11.4.1, 11.0.6, 11.1.5, 11.2.4
Description
Starting with MDEV-32042, we assume that the function buf_page_get_gen() is not being invoked while log is being applied during crash recovery. The idea is that any access to buffer pool pages during crash recovery must look up pages via recv_sys_t::recover(). This was being violated by the function trx_undo_mem_create_at_db_start() during startup.
As far as I can tell, there are two possible outcomes of this:
- (the lucky case): InnoDB refuses to start up and reports Data structure corruption.
This causes occasional failures of the test innodb.table_flags and some other tests, including some mariabackup tests. - (the scary case): InnoDB recovers incorrect state of some transactions, and may cause permanent corruption.
I have not reproduced this; this is my hypothesis.
The fix is twofold:
- Add an assertion to buf_page_get_gen() to catch incorrect calls, and implement a special case for recv_sys.recover()
- Make recv_sys.recover() only buffer-fix pages, instead of acquiring shared latches. This is needed because there will be some recursive access to undo pages, and the shared latches are not recursive. A buffer-fix is sufficient, because at the early phase of startup there are no page modifications, other than by the application of the write-ahead log (ib_logfile0).
Attachments
Issue Links
- is caused by
-
MDEV-32042 Special handling of crash recovery in buf_page_get_gen() may cause overhead
- Closed
- relates to
-
MDEV-34426 Debug assertion failure on bootstrap with innodb_undo_tablespaces>107
- Closed
-
MDEV-34707 Assertion `mode == 9 ? __builtin_expect(recv_sys.recovery_on, (0)) || log_sys.get_lsn() < 120000 : !__builtin_expect(recv_sys.recovery_on, (0)) || recv_sys.after_apply' failed in buf_page_get_gen
- Closed