The assertion is failing in an error-handling code path. I was able to reproduce the crash on the second attempt with the longer sequence of tests, and on the first iteration of
The crash would occur on the first server restart in the test innodb.log_file_name. A message that the test is expecting was output to the log, along with some error messages:
2023-11-09 8:40:13 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 83. Another data file called ./test/t1.ibd exists with the same space ID.
2023-11-09 8:40:13 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 83. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
I believe that this assertion is too strict and needs to be relaxed for the case that the recovery was aborted due to an error. I will have to analyze the rr replay trace before deciding the details.
Marko Mäkelä
added a comment - The assertion is failing in an error-handling code path. I was able to reproduce the crash on the second attempt with the longer sequence of tests, and on the first iteration of
while ./mtr --rr=-h --no-reorder innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.log_file_name
do :; done
The crash would occur on the first server restart in the test innodb.log_file_name . A message that the test is expecting was output to the log, along with some error messages:
2023-11-09 8:40:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47819,47819
2023-11-09 8:40:13 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 83. Another data file called ./test/t1.ibd exists with the same space ID.
2023-11-09 8:40:13 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 83. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
2023-11-09 8:40:13 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 43)
2023-11-09 8:40:13 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
2023-11-09 8:40:13 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error
2023-11-09 8:40:13 0 [Note] InnoDB: Starting shutdown...
mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:969: void recv_sys_t::clear(): Assertion `!after_apply || !(blocks).end' failed.
I believe that this assertion is too strict and needs to be relaxed for the case that the recovery was aborted due to an error. I will have to analyze the rr replay trace before deciding the details.
The flag recv_sys.after_apply is only being read by the debug assertion. However, something else needs to be fixed as well, because on the second iteration of the tests I got the following:
innodb.log_file_name 'innodb' [ fail ] Found warnings/errors in server log file!
Test ended at 2023-11-09 09:14:18
line
2023-11-09 9:13:58 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 87. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
2023-11-09 9:18:53 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 5. Another data file called ./test/t1.ibd exists with the same space ID.
2023-11-09 9:18:53 0 [Note] InnoDB: Starting final batch to recover 14 pages from redo log.
2023-11-09 9:18:53 0 [ERROR] InnoDB: Cannot replay rename of tablespace 5 from './test/t1.ibd' to './test/t2.ibd' because the target file exists
2023-11-09 9:18:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1520] with error Data structure corruption
Marko Mäkelä
added a comment - The debug assertion failure would be fixed by the following:
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 8c10e5277d5..d52a62cec99 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -3203,7 +3203,6 @@ recv_group_scan_log_recs(
log_sys.log.scanned_lsn = end_lsn = *contiguous_lsn =
ut_uint64_align_down(*contiguous_lsn, OS_FILE_LOG_BLOCK_SIZE);
- ut_d(recv_sys.after_apply = last_phase);
do {
if (last_phase && store == STORE_NO) {
@@ -3226,6 +3225,7 @@ recv_group_scan_log_recs(
DBUG_RETURN(false);
}
+ ut_d(recv_sys.after_apply = last_phase);
DBUG_PRINT("ib_log", ("%s " LSN_PF " completed",
last_phase ? "rescan" : "scan",
log_sys.log.scanned_lsn));
The flag recv_sys.after_apply is only being read by the debug assertion. However, something else needs to be fixed as well, because on the second iteration of the tests I got the following:
innodb.innodb_bug53046 'innodb' [ pass ] 42
innodb.innodb_bug56143 'innodb' [ pass ] 278
innodb.innodb_bug59307 'innodb' [ pass ] 16
innodb.innodb_bug59733 'innodb' [ pass ] 354
innodb.innodb_bug84958 'innodb' [ pass ] 168
innodb.innodb_stats_rename_table 'innodb' [ pass ] 15
innodb.innodb_sys_var_valgrind 'innodb' [ pass ] 89
innodb.insert-before-delete 'innodb' [ pass ] 17
innodb.instant_alter_bugs 'innodb' [ pass ] 3204
innodb.lock_delete_updated 'innodb' [ pass ] 14
innodb.log_file_name 'innodb' [ fail ] Found warnings/errors in server log file!
Test ended at 2023-11-09 09:14:18
line
2023-11-09 9:13:58 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 87. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
2023-11-09 9:13:58 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 46)
I will try to generate an rr replay trace for that as well. The usual outcome of this test step is the following:
2023-11-09 9:18:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47831,47831
2023-11-09 9:18:53 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 5. Another data file called ./test/t1.ibd exists with the same space ID.
2023-11-09 9:18:53 0 [Note] InnoDB: Starting final batch to recover 14 pages from redo log.
2023-11-09 9:18:53 0 [ERROR] InnoDB: Cannot replay rename of tablespace 5 from './test/t1.ibd' to './test/t2.ibd' because the target file exists
2023-11-09 9:18:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1520] with error Data structure corruption
2023-11-09 9:18:53 0 [Note] InnoDB: Starting shutdown...
Sometimes, an attempt to load the file t2.ibd would report two ERROR messages because the file name t1.ibd was successfully loaded. The messages are output by the following:
#0 sql_print_error (format=format@entry=0x55cb83b7e4a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
#1 0x000055cb84a424bb in ib::error::~error (this=this@entry=0x7fff3378cb60, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2 0x000055cb84809ffb in Datafile::validate_first_page (this=this@entry=0x7fff3378ced0, flush_lsn=flush_lsn@entry=0x0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:604
#3 0x000055cb8480b038 in Datafile::validate_for_recovery (this=this@entry=0x7fff3378ced0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:459
#4 0x000055cb84815f3f in fil_ibd_load (space_id=43, filename=<optimized out>, space=@0x7fff3378d1a8: 0x0) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3108
at /mariadb/10.5/storage/innobase/log/log0recv.cc:884
In the more usual outcome, the ERROR message is reported while trying to apply a RENAME operation:
#0 sql_print_error (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
#1 0x00005635485f64bb in ib::error::~error (this=this@entry=0x7ffcd21964e0, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2 0x00005635484dc79e in recv_sys_t::apply (this=0x56354886b5a0 <recv_sys>, last_batch=last_batch@entry=true) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2828
#3 0x00005635485bec10 in srv_start (create_new_db=<optimized out>) at /mariadb/10.5/storage/innobase/srv/srv0start.cc:1516
#4 0x0000563548353516 in innodb_init (p=<optimized out>) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:4060
#5 0x0000563547ed3130 in ha_initialize_handlerton (plugin=0x56354925a9c0) at /mariadb/10.5/sql/handler.cc:651
This would be preceded by a Note in fil_ibd_load() because the tablespace had already been successfully loaded as t1.ibd:
#0 sql_print_information (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9285
#1 0x00005635485f63e7 in ib::info::~info (this=this@entry=0x7ffcd2196060, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2 0x00005635483c9eb1 in fil_ibd_load (space_id=5, filename=<optimized out>, space=@0x7ffcd2196238: 0x5635493a0170) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3069
at /mariadb/10.5/storage/innobase/log/log0recv.cc:830
Possible improvements (to be implemented in MDEV-25909):
Ensure that fil_ibd_load() issues an error message before returning FIL_LOAD_INVALID, so that the caller fil_name_process() can avoid doing that.
In the "Ignoring data file" message, mention the expected space_id that was passed to fil_ibd_load(), in addition to mentioning the unexpected space->id.
Here, I would only adjust the test so that it will ignore the occasionally issued additional messages.
Marko Mäkelä
added a comment - The function fil_ibd_load() can detect two different errors for the following fault:
RENAME TABLE t1 TO t2;
-- …
--echo # Fault 1: Two dirty files with the same space_id.
--copy_file $MYSQLD_DATADIR/test/t2.ibd $MYSQLD_DATADIR/test/t1.ibd
Sometimes, an attempt to load the file t2.ibd would report two ERROR messages because the file name t1.ibd was successfully loaded. The messages are output by the following:
#0 sql_print_error (format=format@entry=0x55cb83b7e4a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
#1 0x000055cb84a424bb in ib::error::~error (this=this@entry=0x7fff3378cb60, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2 0x000055cb84809ffb in Datafile::validate_first_page (this=this@entry=0x7fff3378ced0, flush_lsn=flush_lsn@entry=0x0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:604
#3 0x000055cb8480b038 in Datafile::validate_for_recovery (this=this@entry=0x7fff3378ced0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:459
#4 0x000055cb84815f3f in fil_ibd_load (space_id=43, filename=<optimized out>, space=@0x7fff3378d1a8: 0x0) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3108
#5 0x000055cb84928dfe in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
at /mariadb/10.5/storage/innobase/log/log0recv.cc:830
and
#0 sql_print_error (format=format@entry=0x55cb83c67190 "InnoDB: Recovery cannot access file %s (tablespace %zu)") at /mariadb/10.5/sql/log.cc:9259
#1 0x000055cb84928e7f in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
at /mariadb/10.5/storage/innobase/log/log0recv.cc:884
In the more usual outcome, the ERROR message is reported while trying to apply a RENAME operation:
#0 sql_print_error (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
#1 0x00005635485f64bb in ib::error::~error (this=this@entry=0x7ffcd21964e0, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2 0x00005635484dc79e in recv_sys_t::apply (this=0x56354886b5a0 <recv_sys>, last_batch=last_batch@entry=true) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2828
#3 0x00005635485bec10 in srv_start (create_new_db=<optimized out>) at /mariadb/10.5/storage/innobase/srv/srv0start.cc:1516
#4 0x0000563548353516 in innodb_init (p=<optimized out>) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:4060
#5 0x0000563547ed3130 in ha_initialize_handlerton (plugin=0x56354925a9c0) at /mariadb/10.5/sql/handler.cc:651
This would be preceded by a Note in fil_ibd_load() because the tablespace had already been successfully loaded as t1.ibd :
#0 sql_print_information (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9285
#1 0x00005635485f63e7 in ib::info::~info (this=this@entry=0x7ffcd2196060, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2 0x00005635483c9eb1 in fil_ibd_load (space_id=5, filename=<optimized out>, space=@0x7ffcd2196238: 0x5635493a0170) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3069
#3 0x00005635484dcdfe in fil_name_process (name=<optimized out>, name@entry=0x7f3660de5ac9 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
at /mariadb/10.5/storage/innobase/log/log0recv.cc:830
Possible improvements (to be implemented in MDEV-25909 ):
Ensure that fil_ibd_load() issues an error message before returning FIL_LOAD_INVALID , so that the caller fil_name_process() can avoid doing that.
In the "Ignoring data file" message, mention the expected space_id that was passed to fil_ibd_load() , in addition to mentioning the unexpected space->id .
Here, I would only adjust the test so that it will ignore the occasionally issued additional messages.
The assertion is failing in an error-handling code path. I was able to reproduce the crash on the second attempt with the longer sequence of tests, and on the first iteration of
while ./mtr --rr=-h --no-reorder innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.log_file_name
do :; done
The crash would occur on the first server restart in the test innodb.log_file_name. A message that the test is expecting was output to the log, along with some error messages:
2023-11-09 8:40:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47819,47819
2023-11-09 8:40:13 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 83. Another data file called ./test/t1.ibd exists with the same space ID.
2023-11-09 8:40:13 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 83. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
2023-11-09 8:40:13 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 43)
2023-11-09 8:40:13 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
2023-11-09 8:40:13 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error
2023-11-09 8:40:13 0 [Note] InnoDB: Starting shutdown...
mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:969: void recv_sys_t::clear(): Assertion `!after_apply || !(blocks).end' failed.
I believe that this assertion is too strict and needs to be relaxed for the case that the recovery was aborted due to an error. I will have to analyze the rr replay trace before deciding the details.