|
The assertion is failing in an error-handling code path. I was able to reproduce the crash on the second attempt with the longer sequence of tests, and on the first iteration of
while ./mtr --rr=-h --no-reorder innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.log_file_name
|
do :; done
|
The crash would occur on the first server restart in the test innodb.log_file_name. A message that the test is expecting was output to the log, along with some error messages:
2023-11-09 8:40:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47819,47819
|
2023-11-09 8:40:13 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 83. Another data file called ./test/t1.ibd exists with the same space ID.
|
2023-11-09 8:40:13 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 83. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
|
2023-11-09 8:40:13 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 43)
|
2023-11-09 8:40:13 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
|
2023-11-09 8:40:13 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error
|
2023-11-09 8:40:13 0 [Note] InnoDB: Starting shutdown...
|
mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:969: void recv_sys_t::clear(): Assertion `!after_apply || !(blocks).end' failed.
|
I believe that this assertion is too strict and needs to be relaxed for the case that the recovery was aborted due to an error. I will have to analyze the rr replay trace before deciding the details.
|
|
The debug assertion failure would be fixed by the following:
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
|
index 8c10e5277d5..d52a62cec99 100644
|
--- a/storage/innobase/log/log0recv.cc
|
+++ b/storage/innobase/log/log0recv.cc
|
@@ -3203,7 +3203,6 @@ recv_group_scan_log_recs(
|
|
log_sys.log.scanned_lsn = end_lsn = *contiguous_lsn =
|
ut_uint64_align_down(*contiguous_lsn, OS_FILE_LOG_BLOCK_SIZE);
|
- ut_d(recv_sys.after_apply = last_phase);
|
|
do {
|
if (last_phase && store == STORE_NO) {
|
@@ -3226,6 +3225,7 @@ recv_group_scan_log_recs(
|
DBUG_RETURN(false);
|
}
|
|
+ ut_d(recv_sys.after_apply = last_phase);
|
DBUG_PRINT("ib_log", ("%s " LSN_PF " completed",
|
last_phase ? "rescan" : "scan",
|
log_sys.log.scanned_lsn));
|
The flag recv_sys.after_apply is only being read by the debug assertion. However, something else needs to be fixed as well, because on the second iteration of the tests I got the following:
innodb.innodb_bug53046 'innodb' [ pass ] 42
|
innodb.innodb_bug56143 'innodb' [ pass ] 278
|
innodb.innodb_bug59307 'innodb' [ pass ] 16
|
innodb.innodb_bug59733 'innodb' [ pass ] 354
|
innodb.innodb_bug84958 'innodb' [ pass ] 168
|
innodb.innodb_stats_rename_table 'innodb' [ pass ] 15
|
innodb.innodb_sys_var_valgrind 'innodb' [ pass ] 89
|
innodb.insert-before-delete 'innodb' [ pass ] 17
|
innodb.instant_alter_bugs 'innodb' [ pass ] 3204
|
innodb.lock_delete_updated 'innodb' [ pass ] 14
|
innodb.log_file_name 'innodb' [ fail ] Found warnings/errors in server log file!
|
Test ended at 2023-11-09 09:14:18
|
line
|
2023-11-09 9:13:58 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 87. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
|
2023-11-09 9:13:58 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 46)
|
I will try to generate an rr replay trace for that as well. The usual outcome of this test step is the following:
2023-11-09 9:18:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47831,47831
|
2023-11-09 9:18:53 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 5. Another data file called ./test/t1.ibd exists with the same space ID.
|
2023-11-09 9:18:53 0 [Note] InnoDB: Starting final batch to recover 14 pages from redo log.
|
2023-11-09 9:18:53 0 [ERROR] InnoDB: Cannot replay rename of tablespace 5 from './test/t1.ibd' to './test/t2.ibd' because the target file exists
|
2023-11-09 9:18:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1520] with error Data structure corruption
|
2023-11-09 9:18:53 0 [Note] InnoDB: Starting shutdown...
|
|
|
The function fil_ibd_load() can detect two different errors for the following fault:
RENAME TABLE t1 TO t2;
|
-- …
|
--echo # Fault 1: Two dirty files with the same space_id.
|
--copy_file $MYSQLD_DATADIR/test/t2.ibd $MYSQLD_DATADIR/test/t1.ibd
|
Sometimes, an attempt to load the file t2.ibd would report two ERROR messages because the file name t1.ibd was successfully loaded. The messages are output by the following:
#0 sql_print_error (format=format@entry=0x55cb83b7e4a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
|
#1 0x000055cb84a424bb in ib::error::~error (this=this@entry=0x7fff3378cb60, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
|
#2 0x000055cb84809ffb in Datafile::validate_first_page (this=this@entry=0x7fff3378ced0, flush_lsn=flush_lsn@entry=0x0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:604
|
#3 0x000055cb8480b038 in Datafile::validate_for_recovery (this=this@entry=0x7fff3378ced0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:459
|
#4 0x000055cb84815f3f in fil_ibd_load (space_id=43, filename=<optimized out>, space=@0x7fff3378d1a8: 0x0) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3108
|
#5 0x000055cb84928dfe in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
|
at /mariadb/10.5/storage/innobase/log/log0recv.cc:830
|
and
#0 sql_print_error (format=format@entry=0x55cb83c67190 "InnoDB: Recovery cannot access file %s (tablespace %zu)") at /mariadb/10.5/sql/log.cc:9259
|
#1 0x000055cb84928e7f in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
|
at /mariadb/10.5/storage/innobase/log/log0recv.cc:884
|
In the more usual outcome, the ERROR message is reported while trying to apply a RENAME operation:
#0 sql_print_error (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
|
#1 0x00005635485f64bb in ib::error::~error (this=this@entry=0x7ffcd21964e0, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
|
#2 0x00005635484dc79e in recv_sys_t::apply (this=0x56354886b5a0 <recv_sys>, last_batch=last_batch@entry=true) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2828
|
#3 0x00005635485bec10 in srv_start (create_new_db=<optimized out>) at /mariadb/10.5/storage/innobase/srv/srv0start.cc:1516
|
#4 0x0000563548353516 in innodb_init (p=<optimized out>) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:4060
|
#5 0x0000563547ed3130 in ha_initialize_handlerton (plugin=0x56354925a9c0) at /mariadb/10.5/sql/handler.cc:651
|
This would be preceded by a Note in fil_ibd_load() because the tablespace had already been successfully loaded as t1.ibd:
#0 sql_print_information (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9285
|
#1 0x00005635485f63e7 in ib::info::~info (this=this@entry=0x7ffcd2196060, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
|
#2 0x00005635483c9eb1 in fil_ibd_load (space_id=5, filename=<optimized out>, space=@0x7ffcd2196238: 0x5635493a0170) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3069
|
#3 0x00005635484dcdfe in fil_name_process (name=<optimized out>, name@entry=0x7f3660de5ac9 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
|
at /mariadb/10.5/storage/innobase/log/log0recv.cc:830
|
Possible improvements (to be implemented in MDEV-25909):
- Ensure that fil_ibd_load() issues an error message before returning FIL_LOAD_INVALID, so that the caller fil_name_process() can avoid doing that.
- In the "Ignoring data file" message, mention the expected space_id that was passed to fil_ibd_load(), in addition to mentioning the unexpected space->id.
Here, I would only adjust the test so that it will ignore the occasionally issued additional messages.
|