[MDEV-24302] RESET MASTER hangs as Innodb does not report on binlog checkpoint Created: 2020-11-27 Updated: 2021-05-06 Resolved: 2021-03-29 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - InnoDB |
| Affects Version/s: | 10.5, 10.6 |
| Fix Version/s: | 10.5.10 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Andrei Elkin | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | affects-tests | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
The failed one in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22164
As it was never observed before the extra to 10.5 commits A similar failure
exists in http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/18570/steps/test/logs/stdio Upon more analysis the most probable suspect
is identified. |
| Comments |
| Comment by Vladislav Vaintroub [ 2020-11-27 ] | |||||||||||||||||||||||
|
Also seen in http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/22195/steps/test/logs/stdio , | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-01 ] | |||||||||||||||||||||||
|
http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/18570/steps/test/logs/stdio shows a hang of main.mysqldump-max. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-12-01 ] | |||||||||||||||||||||||
|
I think that the InnoDB changes are related, not causing this bug (causation is not correlation). I suspect that the logic is broken from the start of I suspect that because InnoDB is causing much fewer unnecessary wake-ups in 10.5 since recently, other threads have more chances to run into race conditions. The exact mechanism of this race is not known yet, and it has not been verified whether removing the suspicious code would fix this race. After all, the tests are not failing deterministically. | |||||||||||||||||||||||
| Comment by Andrei Elkin [ 2020-12-30 ] | |||||||||||||||||||||||
|
marko: Your guess turns out right. A race has been identified. The original idea worked by virtue of former over-abundant attempts to flush by Innodb. A patch is at testing. | |||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-01-07 ] | |||||||||||||||||||||||
|
HNY, dear comrade! As you were involved, and the matter of the patch is actually not foreign to you, Andrei | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-01-07 ] | |||||||||||||||||||||||
|
Elkin, I think that we will need separate 10.2 and 10.5 versions of this fix. Before 10.5, log_sys.lsn and log_sys.flushed_to_disk_lsn were protected by log_sys.mutex. Starting with 10.5, they use std::atomic. In the 10.2 version, I think that you should protect pending_checkpoint_list with log_sys.mutex because it has to be acquired anyway. For 10.5, an alternative mechanism may be chosen. What is your testing plan for this? I think that this will need some stress testing. It should be noted that 10.3 and 10.6 are likely changing the timings compared to 10.2 and 10.5. So, I would like to see stress tests on 10.3 and 10.6 as well. | |||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-01-07 ] | |||||||||||||||||||||||
|
marko: Thanks for a quick look! Your point of 10.2-10.5 was already found as I had started with the 10.5 to get a patch which was "downgraded" to 10.2 style of atomic interfaces. To the mutex protection suggestion I explained on Github why 'no need' (specifically to this backup of the proposal To the testing plan, indeed a stress testing is a must, I did light ones, and alice has been entasked (through | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-01-12 ] | |||||||||||||||||||||||
|
My understanding is that the code that was added in Why is innobase_checkpoint_request() consulting the latest log_sys_t::lsn at all? The latest log records might have been written by some internal activity, such as change buffer merge or the purge of transaction history, which would have nothing to do with the durability of a transaction COMMIT or XA PREPARE in the SQL layer. What we really need to keep track of is the mtr_t::commit_lsn() of the trx_write_serialisation_history() that is supposed to be durable. Possibly also the LSN of trx_prepare() is interesting; trx_flush_log_if_needed_low() is able to skip the log write in that code path as well. I think that we must properly propagate the LSN from the PREPARE or COMMIT to this code. The current implementation may involve unnecessary waits, in addition to suffering from the race condition in innobase_mysql_log_notify(), as highlighted in the Description. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-02-10 ] | |||||||||||||||||||||||
|
I observed another failure in a 10.6-based branch, which I thought might be related to this one:
As far as I can tell, no thread is executing inside InnoDB, except buf_flush_page_cleaner(), which looks idle. | |||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-02-10 ] | |||||||||||||||||||||||
|
I suggest to patch 10.5 directly, since it was found broken here. While older versions might have the bug as well, but it is compensated for. Also programming with atomics was rather different in the past. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-05 ] | |||||||||||||||||||||||
|
One more test joins the club:
| |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-15 ] | |||||||||||||||||||||||
|
One more test joins the club:
| |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-23 ] | |||||||||||||||||||||||
|
The probability of hang could be improved by running with innodb_force_recovery=2 (to prevent the purge of InnoDB transaction history from running). Any ‘extra’ writes to the redo log (outside transaction commit) could ‘rescue’ the hang. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-26 ] | |||||||||||||||||||||||
|
I pushed some code cleanup to the branch bb-10.5-MDEV-24302 as well as Elkin’s test case for demonstrating the innodb_force_recovery=2 hang if a workaround is not present. I determined with rr replay that the missing log write of the last transaction commit is related to trx_t::flush_log_later and trx_t::active_commit_ordered, which to my understanding only exist to satisfy some requirement of the binlog. I am starting to think that it is not safe to read log_sys.lsn in innobase_hton->commit_checkpoint_request and expect that something is guaranteed to write everything up to that LSN to the log. If there is no pressure to perform a redo log checkpoint, that write might never occur. Only ‘important’ state transitions (such as transaction commit) are normally persisted to the redo log, and also that can be prevented by overriding the default setting of innodb_flush_log_at_trx_commit=1 or by the said trx_t flags. If some ‘unimportant’ write to the log happened after a transaction commit, we may wait forever for that ‘unimportant’ log_sys.lsn to be persisted. Also, in the cmake -DWITH_PMEM=ON case when the redo log is on persistent memory (MDEV-9905), we are missing a call to innobase_mysql_log_notify() from log_write(). | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-27 ] | |||||||||||||||||||||||
|
I figured out that with innodb_force_recovery=2, the srv_master_callback() or its predecessor srv_master_thread will never be registered. For that special case, we can initiate a log write directly without any buffering, just like our implementation of the interface for MyRocks does. I pushed a revised version to the branch. It includes some cleanup for readability and performance. The commit message identifies a change that potentially prevents the hang. An equivalent change was also present in Elkin’s version. wlad, please review it. I think that we must keep this ticket open for some time after pushing the tentative fix, to see if it really fixes the hangs, because the hangs are occurring rather rarely. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-30 ] | |||||||||||||||||||||||
|
Elkin pointed out another race condition that I addressed in a follow-up fix. That fix will also change log_sys.lsn and log_sys.flushed_to_disk_lsn to use std::memory_order_acquire loads and std::memory_order_release stores. It should make no difference on instruction set architectures that use Total Store Ordering (IA-32, AMD64, SPARC), but there is some difference on ARM, POWER and RISC-V RVWMO, for example. Before 10.5, those variables were protected by log_sys.mutex. This follow-up fix may correct some race conditions related to page flushing that could have caused recovery failures on such ‘exotic’ platforms. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-05-06 ] | |||||||||||||||||||||||
|
Unfortunately, it looks like the InnoDB side fixes were insufficient. I filed MDEV-25611 for another hang in RESET MASTER that was observed in 10.6 in the test versioning.rpl. I have reasons to believe that the failure affects 10.5 as well. |