[MDEV-32737] innodb.log_file_name fails on Assertion `!after_apply || !(blocks).end' in recv_sys_t::clear Created: 2023-11-09  Updated: 2023-11-10  Resolved: 2023-11-09

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5
Fix Version/s: 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None


 Description   

The failure happens fairly well after a sequence of tests, e.g.

perl ./mtr --mem --noreorder innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59307 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_stats_rename_table innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.lock_delete_updated innodb.log_file_name

The sequence can be reduced further, but the probability of the failure seems to be decreasing. The shortest I could get it with was

innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.log_file_name

10.5 ea0b1ccd

mariadbd: /data/src/10.5/storage/innobase/log/log0recv.cc:969: void recv_sys_t::clear(): Assertion `!after_apply || !(blocks).end' failed.
231109  1:59:59 [ERROR] mysqld got signal 6 ;
 
#9  0x00007eff45053e32 in __GI___assert_fail (assertion=0x5635e41bdb60 "!after_apply || !(blocks).end", file=0x5635e41bc100 "/data/src/10.5/storage/innobase/log/log0recv.cc", line=969, function=0x5635e41bdb20 "void recv_sys_t::clear()") at ./assert/assert.c:101
#10 0x00005635e2b61c5a in recv_sys_t::clear (this=0x5635e546d920 <recv_sys>) at /data/src/10.5/storage/innobase/log/log0recv.cc:969
#11 0x00005635e2b3af56 in recv_sys_t::close (this=0x5635e546d920 <recv_sys>) at /data/src/10.5/storage/innobase/log/log0recv.cc:913
#12 0x00005635e2b32f90 in log_t::close (this=0x5635e62e55c0 <log_sys>) at /data/src/10.5/storage/innobase/log/log0log.cc:1350
#13 0x00005635e2df6ae0 in innodb_shutdown () at /data/src/10.5/storage/innobase/srv/srv0start.cc:2083
#14 0x00005635e2942f37 in innodb_init (p=0x615000002d98) at /data/src/10.5/storage/innobase/handler/ha_innodb.cc:4063
#15 0x00005635e1ee5c2c in ha_initialize_handlerton (plugin=0x62100002e010) at /data/src/10.5/sql/handler.cc:651
#16 0x00005635e1710726 in plugin_do_initialize (plugin=0x62100002e010, state=@0x7ffe981afdd0: 4) at /data/src/10.5/sql/sql_plugin.cc:1450
#17 0x00005635e1711069 in plugin_initialize (tmp_root=0x7ffe981b0530, plugin=0x62100002e010, argc=0x5635e59dae00 <remaining_argc>, argv=0x61b000001cf8, options_only=false) at /data/src/10.5/sql/sql_plugin.cc:1503
#18 0x00005635e1712b0a in plugin_init (argc=0x5635e59dae00 <remaining_argc>, argv=0x61b000001cf8, flags=0) at /data/src/10.5/sql/sql_plugin.cc:1761
#19 0x00005635e13cbad5 in init_server_components () at /data/src/10.5/sql/mysqld.cc:4939
#20 0x00005635e13cdc0b in mysqld_main (argc=164, argv=0x61b000001cf8) at /data/src/10.5/sql/mysqld.cc:5533
#21 0x00005635e13b7979 in main (argc=23, argv=0x7ffe981b2298) at /data/src/10.5/sql/main.cc:25



 Comments   
Comment by Marko Mäkelä [ 2023-11-09 ]

The assertion is failing in an error-handling code path. I was able to reproduce the crash on the second attempt with the longer sequence of tests, and on the first iteration of

while ./mtr --rr=-h --no-reorder innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.log_file_name
do :; done

The crash would occur on the first server restart in the test innodb.log_file_name. A message that the test is expecting was output to the log, along with some error messages:

2023-11-09  8:40:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47819,47819
2023-11-09  8:40:13 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 83. Another data file called ./test/t1.ibd exists with the same space ID.
2023-11-09  8:40:13 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 83. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
2023-11-09  8:40:13 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 43)
2023-11-09  8:40:13 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
2023-11-09  8:40:13 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error
2023-11-09  8:40:13 0 [Note] InnoDB: Starting shutdown...
mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:969: void recv_sys_t::clear(): Assertion `!after_apply || !(blocks).end' failed.

I believe that this assertion is too strict and needs to be relaxed for the case that the recovery was aborted due to an error. I will have to analyze the rr replay trace before deciding the details.

Comment by Marko Mäkelä [ 2023-11-09 ]

The debug assertion failure would be fixed by the following:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 8c10e5277d5..d52a62cec99 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -3203,7 +3203,6 @@ recv_group_scan_log_recs(
 
 	log_sys.log.scanned_lsn = end_lsn = *contiguous_lsn =
 		ut_uint64_align_down(*contiguous_lsn, OS_FILE_LOG_BLOCK_SIZE);
-	ut_d(recv_sys.after_apply = last_phase);
 
 	do {
 		if (last_phase && store == STORE_NO) {
@@ -3226,6 +3225,7 @@ recv_group_scan_log_recs(
 		DBUG_RETURN(false);
 	}
 
+	ut_d(recv_sys.after_apply = last_phase);
 	DBUG_PRINT("ib_log", ("%s " LSN_PF " completed",
 			      last_phase ? "rescan" : "scan",
 			      log_sys.log.scanned_lsn));

The flag recv_sys.after_apply is only being read by the debug assertion. However, something else needs to be fixed as well, because on the second iteration of the tests I got the following:

innodb.innodb_bug53046 'innodb'          [ pass ]     42
innodb.innodb_bug56143 'innodb'          [ pass ]    278
innodb.innodb_bug59307 'innodb'          [ pass ]     16
innodb.innodb_bug59733 'innodb'          [ pass ]    354
innodb.innodb_bug84958 'innodb'          [ pass ]    168
innodb.innodb_stats_rename_table 'innodb' [ pass ]     15
innodb.innodb_sys_var_valgrind 'innodb'  [ pass ]     89
innodb.insert-before-delete 'innodb'     [ pass ]     17
innodb.instant_alter_bugs 'innodb'       [ pass ]   3204
innodb.lock_delete_updated 'innodb'      [ pass ]     14
innodb.log_file_name 'innodb'            [ fail ]  Found warnings/errors in server log file!
        Test ended at 2023-11-09 09:14:18
line
2023-11-09  9:13:58 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 87. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
2023-11-09  9:13:58 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 46)

I will try to generate an rr replay trace for that as well. The usual outcome of this test step is the following:

2023-11-09  9:18:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47831,47831
2023-11-09  9:18:53 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 5. Another data file called ./test/t1.ibd exists with the same space ID.
2023-11-09  9:18:53 0 [Note] InnoDB: Starting final batch to recover 14 pages from redo log.
2023-11-09  9:18:53 0 [ERROR] InnoDB: Cannot replay rename of tablespace 5 from './test/t1.ibd' to './test/t2.ibd' because the target file exists
2023-11-09  9:18:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1520] with error Data structure corruption
2023-11-09  9:18:53 0 [Note] InnoDB: Starting shutdown...

Comment by Marko Mäkelä [ 2023-11-09 ]

The function fil_ibd_load() can detect two different errors for the following fault:

RENAME TABLE t1 TO t2;
-- …
--echo # Fault 1: Two dirty files with the same space_id.
--copy_file $MYSQLD_DATADIR/test/t2.ibd $MYSQLD_DATADIR/test/t1.ibd

Sometimes, an attempt to load the file t2.ibd would report two ERROR messages because the file name t1.ibd was successfully loaded. The messages are output by the following:

#0  sql_print_error (format=format@entry=0x55cb83b7e4a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
#1  0x000055cb84a424bb in ib::error::~error (this=this@entry=0x7fff3378cb60, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2  0x000055cb84809ffb in Datafile::validate_first_page (this=this@entry=0x7fff3378ced0, flush_lsn=flush_lsn@entry=0x0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:604
#3  0x000055cb8480b038 in Datafile::validate_for_recovery (this=this@entry=0x7fff3378ced0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:459
#4  0x000055cb84815f3f in fil_ibd_load (space_id=43, filename=<optimized out>, space=@0x7fff3378d1a8: 0x0) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3108
#5  0x000055cb84928dfe in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
    at /mariadb/10.5/storage/innobase/log/log0recv.cc:830

and

#0  sql_print_error (format=format@entry=0x55cb83c67190 "InnoDB: Recovery cannot access file %s (tablespace %zu)") at /mariadb/10.5/sql/log.cc:9259
#1  0x000055cb84928e7f in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
    at /mariadb/10.5/storage/innobase/log/log0recv.cc:884

In the more usual outcome, the ERROR message is reported while trying to apply a RENAME operation:

#0  sql_print_error (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
#1  0x00005635485f64bb in ib::error::~error (this=this@entry=0x7ffcd21964e0, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2  0x00005635484dc79e in recv_sys_t::apply (this=0x56354886b5a0 <recv_sys>, last_batch=last_batch@entry=true) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2828
#3  0x00005635485bec10 in srv_start (create_new_db=<optimized out>) at /mariadb/10.5/storage/innobase/srv/srv0start.cc:1516
#4  0x0000563548353516 in innodb_init (p=<optimized out>) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:4060
#5  0x0000563547ed3130 in ha_initialize_handlerton (plugin=0x56354925a9c0) at /mariadb/10.5/sql/handler.cc:651

This would be preceded by a Note in fil_ibd_load() because the tablespace had already been successfully loaded as t1.ibd:

#0  sql_print_information (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9285
#1  0x00005635485f63e7 in ib::info::~info (this=this@entry=0x7ffcd2196060, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2  0x00005635483c9eb1 in fil_ibd_load (space_id=5, filename=<optimized out>, space=@0x7ffcd2196238: 0x5635493a0170) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3069
#3  0x00005635484dcdfe in fil_name_process (name=<optimized out>, name@entry=0x7f3660de5ac9 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
    at /mariadb/10.5/storage/innobase/log/log0recv.cc:830

Possible improvements (to be implemented in MDEV-25909):

  • Ensure that fil_ibd_load() issues an error message before returning FIL_LOAD_INVALID, so that the caller fil_name_process() can avoid doing that.
  • In the "Ignoring data file" message, mention the expected space_id that was passed to fil_ibd_load(), in addition to mentioning the unexpected space->id.

Here, I would only adjust the test so that it will ignore the occasionally issued additional messages.

Generated at Thu Feb 08 10:33:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.