Details
Description
The test mariabackup.innodb_force_recovery occasionally failed in a 10.6 based development branch that contains a fix of MDEV-26827 and MDEV-26055. We have some rather rare failures of the test occurring on buildbot as well, like this:
10.10 4ce6e78059c1a91765c91afa5330737bd7568d0a |
mariabackup.innodb_force_recovery 'innodb' w13 [ fail ] timeout after 480 seconds
|
Test ended at 2022-07-28 12:27:20
|
|
Test case timeout after 480 seconds
|
|
== /buildbot/amd64-ubuntu-2004-msan/build/mysql-test/var/13/log/innodb_force_recovery.log ==
|
CREATE TABLE t(i INT) ENGINE INNODB;
|
INSERT INTO t VALUES(1);
|
# "innodb_force_recovery=1" should be allowed with "--prepare" only (mariabackup)
|
FOUND 1 /should only be used with "--prepare"/ in backup.log
|
# "innodb_force_recovery=1" should be allowed with "--apply-log" only (innobackupex)
|
FOUND 1 /should only be used with "--apply-log"/ in backup.log
|
# "innodb_force_recovery" should be limited to "SRV_FORCE_IGNORE_CORRUPT" (mariabackup)
|
FOUND 1 /innodb_force_recovery = 1/ in backup.log
|
# "innodb_force_recovery" should be limited to "SRV_FORCE_IGNORE_CORRUPT" (innobackupex)
|
FOUND 1 /innodb_force_recovery = 1/ in backup.log
|
# "innodb_force_recovery" should be read from "backup-my.cnf" (mariabackup)
|
FOUND 1 /innodb_force_recovery = 1/ in backup.log
|
# "innodb_force_recovery=1" should be read from "backup-my.cnf" (innobackupex)
|
FOUND 1 /innodb_force_recovery = 1/ in backup.log
|
# "innodb_force_recovery" from the command line should override "backup-my.cnf" (mariabackup)
|
NOT FOUND /innodb_force_recovery = 1/ in backup.log
|
# "innodb_force_recovery" from the command line should override "backup-my.cnf" (innobackupex)
|
NOT FOUND /innodb_force_recovery = 1/ in backup.log
|
# shutdown server
|
|
== /buildbot/amd64-ubuntu-2004-msan/build/mysql-test/var/13/tmp/analyze-timeout-mysqld.1.err ==
|
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local server through socket '/buildbot/amd64-ubuntu-2004-msan/build/mysql-test/var/tmp/13/mys' (2)
|
This fix worked in the case that I was able to reproduce:
diff --git a/storage/innobase/srv/srv0start.cc b/storage/innobase/srv/srv0start.cc
|
index 5bf085079c8..4a01326047a 100644
|
--- a/storage/innobase/srv/srv0start.cc
|
+++ b/storage/innobase/srv/srv0start.cc
|
@@ -905,9 +905,7 @@ static lsn_t srv_prepare_to_delete_redo_log_file(bool old_exists)
|
{
|
DBUG_ENTER("srv_prepare_to_delete_redo_log_file");
|
|
- /* Disable checkpoints in the page cleaner. */
|
- ut_ad(!recv_sys.recovery_on);
|
- recv_sys.recovery_on= true;
|
+ ut_ad(recv_sys.recovery_on);
|
|
/* Clean the buffer pool. */
|
buf_flush_sync();
|
@@ -1589,10 +1587,10 @@ dberr_t srv_start(bool create_new_db)
|
}
|
}
|
|
- recv_sys.debug_free();
|
-
|
if (srv_operation == SRV_OPERATION_RESTORE
|
|| srv_operation == SRV_OPERATION_RESTORE_EXPORT) {
|
+ buf_flush_sync();
|
+ recv_sys.debug_free();
|
/* After applying the redo log from
|
SRV_OPERATION_BACKUP, flush the changes
|
to the data files and truncate or delete the log.
|
@@ -1684,6 +1682,8 @@ dberr_t srv_start(bool create_new_db)
|
return(srv_init_abort(err));
|
}
|
}
|
+
|
+ recv_sys.debug_free();
|
}
|
|
ut_ad(err == DB_SUCCESS); |
The failure that I observed was caused by the fact that buf_flush_page_cleaner() took a little longer to write out the recovered modified pages, and this thread had already executed recv_sys.debug_free(), which causes the page cleaner to attempt to write checkpoint information. The failures that I observed were caused by the fact that this thread had already closed and deleted the ib_logfile0 when buf_flush_page_cleaner() unexpectedly invoked log_checkpoint_low().
This type of failure should be caused by MDEV-23855, which made log checkpoints the responsibility of the page cleaner thread.
Because I have no stack traces of the test case timeouts, I am unable say if the above patch would fix them.
Attachments
Issue Links
- is caused by
-
MDEV-23855 InnoDB log checkpointing causes regression for write-heavy OLTP
- Closed