Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32737

innodb.log_file_name fails on Assertion `!after_apply || !(blocks).end' in recv_sys_t::clear

Details

    Description

      The failure happens fairly well after a sequence of tests, e.g.

      perl ./mtr --mem --noreorder innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59307 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_stats_rename_table innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.lock_delete_updated innodb.log_file_name
      

      The sequence can be reduced further, but the probability of the failure seems to be decreasing. The shortest I could get it with was

      innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.log_file_name
      

      10.5 ea0b1ccd

      mariadbd: /data/src/10.5/storage/innobase/log/log0recv.cc:969: void recv_sys_t::clear(): Assertion `!after_apply || !(blocks).end' failed.
      231109  1:59:59 [ERROR] mysqld got signal 6 ;
       
      #9  0x00007eff45053e32 in __GI___assert_fail (assertion=0x5635e41bdb60 "!after_apply || !(blocks).end", file=0x5635e41bc100 "/data/src/10.5/storage/innobase/log/log0recv.cc", line=969, function=0x5635e41bdb20 "void recv_sys_t::clear()") at ./assert/assert.c:101
      #10 0x00005635e2b61c5a in recv_sys_t::clear (this=0x5635e546d920 <recv_sys>) at /data/src/10.5/storage/innobase/log/log0recv.cc:969
      #11 0x00005635e2b3af56 in recv_sys_t::close (this=0x5635e546d920 <recv_sys>) at /data/src/10.5/storage/innobase/log/log0recv.cc:913
      #12 0x00005635e2b32f90 in log_t::close (this=0x5635e62e55c0 <log_sys>) at /data/src/10.5/storage/innobase/log/log0log.cc:1350
      #13 0x00005635e2df6ae0 in innodb_shutdown () at /data/src/10.5/storage/innobase/srv/srv0start.cc:2083
      #14 0x00005635e2942f37 in innodb_init (p=0x615000002d98) at /data/src/10.5/storage/innobase/handler/ha_innodb.cc:4063
      #15 0x00005635e1ee5c2c in ha_initialize_handlerton (plugin=0x62100002e010) at /data/src/10.5/sql/handler.cc:651
      #16 0x00005635e1710726 in plugin_do_initialize (plugin=0x62100002e010, state=@0x7ffe981afdd0: 4) at /data/src/10.5/sql/sql_plugin.cc:1450
      #17 0x00005635e1711069 in plugin_initialize (tmp_root=0x7ffe981b0530, plugin=0x62100002e010, argc=0x5635e59dae00 <remaining_argc>, argv=0x61b000001cf8, options_only=false) at /data/src/10.5/sql/sql_plugin.cc:1503
      #18 0x00005635e1712b0a in plugin_init (argc=0x5635e59dae00 <remaining_argc>, argv=0x61b000001cf8, flags=0) at /data/src/10.5/sql/sql_plugin.cc:1761
      #19 0x00005635e13cbad5 in init_server_components () at /data/src/10.5/sql/mysqld.cc:4939
      #20 0x00005635e13cdc0b in mysqld_main (argc=164, argv=0x61b000001cf8) at /data/src/10.5/sql/mysqld.cc:5533
      #21 0x00005635e13b7979 in main (argc=23, argv=0x7ffe981b2298) at /data/src/10.5/sql/main.cc:25
      

      Attachments

        Activity

          The assertion is failing in an error-handling code path. I was able to reproduce the crash on the second attempt with the longer sequence of tests, and on the first iteration of

          while ./mtr --rr=-h --no-reorder innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.log_file_name
          do :; done
          

          The crash would occur on the first server restart in the test innodb.log_file_name. A message that the test is expecting was output to the log, along with some error messages:

          2023-11-09  8:40:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47819,47819
          2023-11-09  8:40:13 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 83. Another data file called ./test/t1.ibd exists with the same space ID.
          2023-11-09  8:40:13 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 83. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
          2023-11-09  8:40:13 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 43)
          2023-11-09  8:40:13 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
          2023-11-09  8:40:13 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error
          2023-11-09  8:40:13 0 [Note] InnoDB: Starting shutdown...
          mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:969: void recv_sys_t::clear(): Assertion `!after_apply || !(blocks).end' failed.
          

          I believe that this assertion is too strict and needs to be relaxed for the case that the recovery was aborted due to an error. I will have to analyze the rr replay trace before deciding the details.

          marko Marko Mäkelä added a comment - The assertion is failing in an error-handling code path. I was able to reproduce the crash on the second attempt with the longer sequence of tests, and on the first iteration of while ./mtr --rr=-h --no-reorder innodb.innodb_bug53046 innodb.innodb_bug56143 innodb.innodb_bug59733 innodb.innodb_bug84958 innodb.innodb_sys_var_valgrind innodb.insert-before-delete innodb.instant_alter_bugs innodb.log_file_name do :; done The crash would occur on the first server restart in the test innodb.log_file_name . A message that the test is expecting was output to the log, along with some error messages: 2023-11-09 8:40:13 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47819,47819 2023-11-09 8:40:13 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 83. Another data file called ./test/t1.ibd exists with the same space ID. 2023-11-09 8:40:13 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 83. Cannot open filepath: ./test/t2.ibd which uses the same space ID. 2023-11-09 8:40:13 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 43) 2023-11-09 8:40:13 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database. 2023-11-09 8:40:13 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1480] with error Generic error 2023-11-09 8:40:13 0 [Note] InnoDB: Starting shutdown... mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:969: void recv_sys_t::clear(): Assertion `!after_apply || !(blocks).end' failed. I believe that this assertion is too strict and needs to be relaxed for the case that the recovery was aborted due to an error. I will have to analyze the rr replay trace before deciding the details.

          The debug assertion failure would be fixed by the following:

          diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
          index 8c10e5277d5..d52a62cec99 100644
          --- a/storage/innobase/log/log0recv.cc
          +++ b/storage/innobase/log/log0recv.cc
          @@ -3203,7 +3203,6 @@ recv_group_scan_log_recs(
           
           	log_sys.log.scanned_lsn = end_lsn = *contiguous_lsn =
           		ut_uint64_align_down(*contiguous_lsn, OS_FILE_LOG_BLOCK_SIZE);
          -	ut_d(recv_sys.after_apply = last_phase);
           
           	do {
           		if (last_phase && store == STORE_NO) {
          @@ -3226,6 +3225,7 @@ recv_group_scan_log_recs(
           		DBUG_RETURN(false);
           	}
           
          +	ut_d(recv_sys.after_apply = last_phase);
           	DBUG_PRINT("ib_log", ("%s " LSN_PF " completed",
           			      last_phase ? "rescan" : "scan",
           			      log_sys.log.scanned_lsn));
          

          The flag recv_sys.after_apply is only being read by the debug assertion. However, something else needs to be fixed as well, because on the second iteration of the tests I got the following:

          innodb.innodb_bug53046 'innodb'          [ pass ]     42
          innodb.innodb_bug56143 'innodb'          [ pass ]    278
          innodb.innodb_bug59307 'innodb'          [ pass ]     16
          innodb.innodb_bug59733 'innodb'          [ pass ]    354
          innodb.innodb_bug84958 'innodb'          [ pass ]    168
          innodb.innodb_stats_rename_table 'innodb' [ pass ]     15
          innodb.innodb_sys_var_valgrind 'innodb'  [ pass ]     89
          innodb.insert-before-delete 'innodb'     [ pass ]     17
          innodb.instant_alter_bugs 'innodb'       [ pass ]   3204
          innodb.lock_delete_updated 'innodb'      [ pass ]     14
          innodb.log_file_name 'innodb'            [ fail ]  Found warnings/errors in server log file!
                  Test ended at 2023-11-09 09:14:18
          line
          2023-11-09  9:13:58 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 87. Cannot open filepath: ./test/t2.ibd which uses the same space ID.
          2023-11-09  9:13:58 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 46)
          

          I will try to generate an rr replay trace for that as well. The usual outcome of this test step is the following:

          2023-11-09  9:18:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47831,47831
          2023-11-09  9:18:53 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 5. Another data file called ./test/t1.ibd exists with the same space ID.
          2023-11-09  9:18:53 0 [Note] InnoDB: Starting final batch to recover 14 pages from redo log.
          2023-11-09  9:18:53 0 [ERROR] InnoDB: Cannot replay rename of tablespace 5 from './test/t1.ibd' to './test/t2.ibd' because the target file exists
          2023-11-09  9:18:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1520] with error Data structure corruption
          2023-11-09  9:18:53 0 [Note] InnoDB: Starting shutdown...
          

          marko Marko Mäkelä added a comment - The debug assertion failure would be fixed by the following: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 8c10e5277d5..d52a62cec99 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -3203,7 +3203,6 @@ recv_group_scan_log_recs( log_sys.log.scanned_lsn = end_lsn = *contiguous_lsn = ut_uint64_align_down(*contiguous_lsn, OS_FILE_LOG_BLOCK_SIZE); - ut_d(recv_sys.after_apply = last_phase); do { if (last_phase && store == STORE_NO) { @@ -3226,6 +3225,7 @@ recv_group_scan_log_recs( DBUG_RETURN(false); } + ut_d(recv_sys.after_apply = last_phase); DBUG_PRINT("ib_log", ("%s " LSN_PF " completed", last_phase ? "rescan" : "scan", log_sys.log.scanned_lsn)); The flag recv_sys.after_apply is only being read by the debug assertion. However, something else needs to be fixed as well, because on the second iteration of the tests I got the following: innodb.innodb_bug53046 'innodb' [ pass ] 42 innodb.innodb_bug56143 'innodb' [ pass ] 278 innodb.innodb_bug59307 'innodb' [ pass ] 16 innodb.innodb_bug59733 'innodb' [ pass ] 354 innodb.innodb_bug84958 'innodb' [ pass ] 168 innodb.innodb_stats_rename_table 'innodb' [ pass ] 15 innodb.innodb_sys_var_valgrind 'innodb' [ pass ] 89 innodb.insert-before-delete 'innodb' [ pass ] 17 innodb.instant_alter_bugs 'innodb' [ pass ] 3204 innodb.lock_delete_updated 'innodb' [ pass ] 14 innodb.log_file_name 'innodb' [ fail ] Found warnings/errors in server log file! Test ended at 2023-11-09 09:14:18 line 2023-11-09 9:13:58 0 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace test/t1 at filepath: ./test/t1.ibd uses space ID: 87. Cannot open filepath: ./test/t2.ibd which uses the same space ID. 2023-11-09 9:13:58 0 [ERROR] InnoDB: Recovery cannot access file ./test/t2.ibd (tablespace 46) I will try to generate an rr replay trace for that as well. The usual outcome of this test step is the following: 2023-11-09 9:18:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47831,47831 2023-11-09 9:18:53 0 [Note] InnoDB: Ignoring data file './test/t2.ibd' with space ID 5. Another data file called ./test/t1.ibd exists with the same space ID. 2023-11-09 9:18:53 0 [Note] InnoDB: Starting final batch to recover 14 pages from redo log. 2023-11-09 9:18:53 0 [ERROR] InnoDB: Cannot replay rename of tablespace 5 from './test/t1.ibd' to './test/t2.ibd' because the target file exists 2023-11-09 9:18:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1520] with error Data structure corruption 2023-11-09 9:18:53 0 [Note] InnoDB: Starting shutdown...

          The function fil_ibd_load() can detect two different errors for the following fault:

          RENAME TABLE t1 TO t2;
          -- …
          --echo # Fault 1: Two dirty files with the same space_id.
          --copy_file $MYSQLD_DATADIR/test/t2.ibd $MYSQLD_DATADIR/test/t1.ibd
          

          Sometimes, an attempt to load the file t2.ibd would report two ERROR messages because the file name t1.ibd was successfully loaded. The messages are output by the following:

          #0  sql_print_error (format=format@entry=0x55cb83b7e4a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
          #1  0x000055cb84a424bb in ib::error::~error (this=this@entry=0x7fff3378cb60, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
          #2  0x000055cb84809ffb in Datafile::validate_first_page (this=this@entry=0x7fff3378ced0, flush_lsn=flush_lsn@entry=0x0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:604
          #3  0x000055cb8480b038 in Datafile::validate_for_recovery (this=this@entry=0x7fff3378ced0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:459
          #4  0x000055cb84815f3f in fil_ibd_load (space_id=43, filename=<optimized out>, space=@0x7fff3378d1a8: 0x0) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3108
          #5  0x000055cb84928dfe in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
              at /mariadb/10.5/storage/innobase/log/log0recv.cc:830
          

          and

          #0  sql_print_error (format=format@entry=0x55cb83c67190 "InnoDB: Recovery cannot access file %s (tablespace %zu)") at /mariadb/10.5/sql/log.cc:9259
          #1  0x000055cb84928e7f in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
              at /mariadb/10.5/storage/innobase/log/log0recv.cc:884
          

          In the more usual outcome, the ERROR message is reported while trying to apply a RENAME operation:

          #0  sql_print_error (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259
          #1  0x00005635485f64bb in ib::error::~error (this=this@entry=0x7ffcd21964e0, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
          #2  0x00005635484dc79e in recv_sys_t::apply (this=0x56354886b5a0 <recv_sys>, last_batch=last_batch@entry=true) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2828
          #3  0x00005635485bec10 in srv_start (create_new_db=<optimized out>) at /mariadb/10.5/storage/innobase/srv/srv0start.cc:1516
          #4  0x0000563548353516 in innodb_init (p=<optimized out>) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:4060
          #5  0x0000563547ed3130 in ha_initialize_handlerton (plugin=0x56354925a9c0) at /mariadb/10.5/sql/handler.cc:651
          

          This would be preceded by a Note in fil_ibd_load() because the tablespace had already been successfully loaded as t1.ibd:

          #0  sql_print_information (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9285
          #1  0x00005635485f63e7 in ib::info::~info (this=this@entry=0x7ffcd2196060, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
          #2  0x00005635483c9eb1 in fil_ibd_load (space_id=5, filename=<optimized out>, space=@0x7ffcd2196238: 0x5635493a0170) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3069
          #3  0x00005635484dcdfe in fil_name_process (name=<optimized out>, name@entry=0x7f3660de5ac9 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false)
              at /mariadb/10.5/storage/innobase/log/log0recv.cc:830
          

          Possible improvements (to be implemented in MDEV-25909):

          • Ensure that fil_ibd_load() issues an error message before returning FIL_LOAD_INVALID, so that the caller fil_name_process() can avoid doing that.
          • In the "Ignoring data file" message, mention the expected space_id that was passed to fil_ibd_load(), in addition to mentioning the unexpected space->id.

          Here, I would only adjust the test so that it will ignore the occasionally issued additional messages.

          marko Marko Mäkelä added a comment - The function fil_ibd_load() can detect two different errors for the following fault: RENAME TABLE t1 TO t2; -- … --echo # Fault 1: Two dirty files with the same space_id. --copy_file $MYSQLD_DATADIR/test/t2.ibd $MYSQLD_DATADIR/test/t1.ibd Sometimes, an attempt to load the file t2.ibd would report two ERROR messages because the file name t1.ibd was successfully loaded. The messages are output by the following: #0 sql_print_error (format=format@entry=0x55cb83b7e4a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259 #1 0x000055cb84a424bb in ib::error::~error (this=this@entry=0x7fff3378cb60, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222 #2 0x000055cb84809ffb in Datafile::validate_first_page (this=this@entry=0x7fff3378ced0, flush_lsn=flush_lsn@entry=0x0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:604 #3 0x000055cb8480b038 in Datafile::validate_for_recovery (this=this@entry=0x7fff3378ced0) at /mariadb/10.5/storage/innobase/fsp/fsp0file.cc:459 #4 0x000055cb84815f3f in fil_ibd_load (space_id=43, filename=<optimized out>, space=@0x7fff3378d1a8: 0x0) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3108 #5 0x000055cb84928dfe in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false) at /mariadb/10.5/storage/innobase/log/log0recv.cc:830 and #0 sql_print_error (format=format@entry=0x55cb83c67190 "InnoDB: Recovery cannot access file %s (tablespace %zu)") at /mariadb/10.5/sql/log.cc:9259 #1 0x000055cb84928e7f in fil_name_process (name=<optimized out>, name@entry=0x7fff32ce66a1 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false) at /mariadb/10.5/storage/innobase/log/log0recv.cc:884 In the more usual outcome, the ERROR message is reported while trying to apply a RENAME operation: #0 sql_print_error (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9259 #1 0x00005635485f64bb in ib::error::~error (this=this@entry=0x7ffcd21964e0, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222 #2 0x00005635484dc79e in recv_sys_t::apply (this=0x56354886b5a0 <recv_sys>, last_batch=last_batch@entry=true) at /mariadb/10.5/storage/innobase/log/log0recv.cc:2828 #3 0x00005635485bec10 in srv_start (create_new_db=<optimized out>) at /mariadb/10.5/storage/innobase/srv/srv0start.cc:1516 #4 0x0000563548353516 in innodb_init (p=<optimized out>) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:4060 #5 0x0000563547ed3130 in ha_initialize_handlerton (plugin=0x56354925a9c0) at /mariadb/10.5/sql/handler.cc:651 This would be preceded by a Note in fil_ibd_load() because the tablespace had already been successfully loaded as t1.ibd : #0 sql_print_information (format=format@entry=0x5635477324a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9285 #1 0x00005635485f63e7 in ib::info::~info (this=this@entry=0x7ffcd2196060, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222 #2 0x00005635483c9eb1 in fil_ibd_load (space_id=5, filename=<optimized out>, space=@0x7ffcd2196238: 0x5635493a0170) at /mariadb/10.5/storage/innobase/fil/fil0fil.cc:3069 #3 0x00005635484dcdfe in fil_name_process (name=<optimized out>, name@entry=0x7f3660de5ac9 "./test/t2.ibd", len=len@entry=13, space_id=<optimized out>, deleted=deleted@entry=false) at /mariadb/10.5/storage/innobase/log/log0recv.cc:830 Possible improvements (to be implemented in MDEV-25909 ): Ensure that fil_ibd_load() issues an error message before returning FIL_LOAD_INVALID , so that the caller fil_name_process() can avoid doing that. In the "Ignoring data file" message, mention the expected space_id that was passed to fil_ibd_load() , in addition to mentioning the unexpected space->id . Here, I would only adjust the test so that it will ignore the occasionally issued additional messages.

          People

            marko Marko Mäkelä
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.