[MDEV-27610] Unnecessary wait in InnoDB crash recovery Created: 2022-01-25  Updated: 2022-02-01  Resolved: 2022-01-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8
Fix Version/s: 10.5.14, 10.6.6, 10.7.2, 10.8.1

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: performance, recovery

Issue Links:
Relates
relates to MDEV-14481 Execute InnoDB crash recovery in the ... Closed

 Description   

As noted in MDEV-14481, there is an unnecessary wait and some dead code in the function recv_sys_t::apply(). Before the function buf_page_get_low() can return, it must acquire an exclusive page latch, and the page must have been read. If the page is being read by another thread, that other thread would already have read-fixed and X-latched the page. Furthermore, before buf_page_read_complete() would return, it would have invoked recv_recover_page() to apply any redo log to the page, before releasing the page latch.

It actually suffices to simply call recv_read_in_area() in order to trigger a transition from page_recv_t::RECV_NOT_PROCESSED to page_recv_t::RECV_BEING_READ for the current block and possibly some following page numbers.

This dead code seems to be present in all InnoDB versions. I think that this is only feasible to fix in 10.5 or later versions, thanks to code simplification that was performed in MDEV-19586, MDEV-21351 and many other tickets.



 Comments   
Comment by Vladislav Lesin [ 2022-01-26 ]

The code (bb-10.5-MDEV-27610, a79a1744) looks good to me.

It's supposed that recv_recover_page() will be invoked from fil_aio_callback(), and there is no need to invoke it from recv_sys_t::apply() synchronously with the page latching.

pages.erase(r) is removed from recv_sys_t::apply(), but pages.clear() is invoked at the end of recv_sys_t::apply().

I don't see errors in the code.

Comment by Matthias Leich [ 2022-01-26 ]

The tree origin/bb-10.5-MDEV-27610 a79a1744595fa80848338809c7cc67086e8ed5fd 2022-01-25T11:09:49+02:00
behaved well in RQG based testing of Crash Recovery.
None of the tests running DML only failed in the sequence   kill DB server, restart with recovery, check the tables with various methods.
Tests running DDL showed various known bad effects. But this was to be expected because DDL is not crash safe in MariaDB 10.5 and lower.

Generated at Thu Feb 08 09:54:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.