Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32103

InnoDB ALTER TABLE is not crash safe

Details

    Description

      origin/bb-10.6-MDEV-32068 a26c51a190030443063362eb3f21c510fcb3c734 2023-09-05T16:30:53+03:00
      but observed on other trees in history too.
       
      Scenario:
      1. Start server, generate some initial data
      2. Several sessions run concurrent a DDL/DML mix
      3. Intentional crash, restart with success
      4. Check the data
      SELECT `col2` FROM `test`.`t3` FORCE INDEX (`Marvão_uidx1`)  WHERE `col2` IS NULL OR `col2` IS NOT NULL harvests 1030: Got error 1 "Operation not permitted" from storage engine InnoDB.
      The server error log contains for the time of the SELECT ... FORCE ...
      [rr 3698310 16732]2023-09-05  7:42:03 6 [ERROR] Cannot find index Marvão_uidx1 in InnoDB index dictionary.
      [rr 3698310 16736]2023-09-05  7:42:03 6 [ERROR] InnoDB indexes are inconsistent with what defined in .frm for table ./test/t3
      [rr 3698310 16740]2023-09-05  7:42:03 6 [ERROR] InnoDB: Table test/t3 contains 2 indexes inside InnoDB, which is different from the number of indexes 3 defined in the .frm file. See https://mariadb.com/kb/en/innodb-troubleshooting/
      The table t3 is ROW_FORMAT = Compact.
       
      sdp:/data1/results/1693922571/TBR-1311--CATCH-0001$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio
      sdp:/data1/results/1693922571/TBR-1311--CATCH-0001/1/fbackup/ contains a file backup
      made after the crash before the restart attempt.
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
      #
      # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD d5889a108c00974367f47657cabe4804fe7df566 2023-09-04T13:53:19+02:00
      # rqg.pl  : Version 4.4.0 (2023-08)
      #
      # $RQG_HOME/rqg.pl \
      # --gendata=conf/mariadb/table_stress.zz \
      # --gendata_sql=conf/mariadb/table_stress.sql \
      # --grammar=conf/mariadb/table_stress_innodb.yy \
      # --reporters=CrashRecovery \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --no_mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=None \
      # --reporters=ErrorLog \
      # --reporters=Deadlock \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma.so \
      # --mysqld=--plugin-load-add=provider_snappy.so \
      # --mysqld=--plugin-load-add=provider_lz4.so \
      # --mysqld=--loose-innodb_compression_level=1 \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--lock-wait-timeout=15 \
      # --mysqld=--innodb-lock-wait-timeout=10 \
      # --mysqld=--loose-innodb_fast_shutdown=0 \
      # --mysqld=--sql_mode=STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION \
      # --mysqld=--innodb_file_per_table=1 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--innodb_stats_persistent=on \
      # --mysqld=--innodb_adaptive_hash_index=off \
      # --mysqld=--innodb_sort_buffer_size=65536 \
      # --mysqld=--innodb_random_read_ahead=OFF \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=33 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--chaos --wait \
      # --mysqld=--innodb_undo_log_truncate=OFF \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --mysqld=--innodb_page_size=16K \
      # --mysqld=--innodb-buffer-pool-size=256M \
      #  <local settings>
      

      Attachments

        Issue Links

          Activity

            The first server instance after bootstrap executed an ALTER TABLE slightly before it was killed:

            Thread 54 hit Breakpoint 2, ha_innobase::commit_inplace_alter_table (this=0xc826406c8b0, altered_table=0x7f2b2f087d90, ha_alter_info=0x7f2b2f087cd0, commit=true) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/handler0alter.cc:11093
            11093	{
            1: m_prebuilt.table.name = {m_name = 0x7aac48017d70 "test/t3"}
            3: m_user_thd.query_string = {string = {str = 0xc8264028220 "ALTER TABLE t3 ADD FULLTEXT INDEX IF NOT EXISTS `Marvão_ftidx2` ( col_text ), ADD UNIQUE KEY IF NOT EXISTS `Marvão_uidx1` ( col2 /* 100800 ASC */, col1 /* 100800 DESC */ )  /* E_R Thread2 QNO 332 CO"..., length = 210}, cs = 0x55f97b094520 <my_charset_latin1>}
            Current event: 433927
             
            Thread 47 received signal SIGKILL, Killed.
            [Switching to Thread 3460236.3474134]
            0x0000000070000002 in syscall_traced ()
            (rr) when
            Current event: 434122
            

            It looks like the last persistent write (update of log_sys.flushed_to_disk_lsn) was some time earlier in this thread:

            #6  0x000055f97a6a09e4 in trx_t::commit_persist (this=this@entry=0x1ac73ffcea80) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/trx/trx0trx.cc:1583
            #7  0x000055f97a700611 in trx_t::commit (this=this@entry=0x1ac73ffcea80, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/dict/drop.cc:238
            #8  0x000055f97a5e50b1 in commit_unlock_and_unlink (trx=0x1ac73ffcea80) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/handler0alter.cc:4339
            #9  0x000055f97a5ec0cc in prepare_inplace_alter_table_dict (ha_alter_info=<optimized out>, altered_table=<optimized out>, old_table=<optimized out>, table_name=<optimized out>, flags=<optimized out>, 
                flags2=<optimized out>, fts_doc_id_col=<optimized out>, add_fts_doc_id=<optimized out>, add_fts_doc_id_idx=<optimized out>) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/handler0alter.cc:7352
            #10 0x000055f97a5ee8e2 in ha_innobase::prepare_inplace_alter_table (this=<optimized out>, altered_table=<optimized out>, ha_alter_info=<optimized out>)
                at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/ha_innodb.h:695
            #11 0x000055f97a1d802d in mysql_inplace_alter_table (target_mdl_request=0x7f2b2f0885e0, alter_ctx=0x7f2b2f089740, trigger_param=0x7f2b2f088190, ddl_log_state=0x7f2b2f087c70, ha_alter_info=0x7f2b2f087cd0, 
                altered_table=0x7f2b2f087d90, table=0xc826408d4c8, table_list=0xc8264028470, thd=0xc826401dad8) at /data/Server/bb-10.6-MDEV-32068/sql/sql_table.cc:7361
            

            The later call to ha_innobase::commit_inplace_alter_table() returned false (indicating success), while log_sys.lsn=19710260 and log_sys.flushed_to_disk_lsn=19691339 (at the value of the above write). This is definitely wrong. These values remained until the SIGKILL. Because there was no call to innodb_check_version on the subsequent server startup, I suppose that the SQL layer had replaced the .frm file and cleared the ddl_recovery.log. That subsequent recovery reported the following:

            2023-09-05  7:41:47 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=42486,42486
            2023-09-05  7:41:50 0 [Note] InnoDB: 10.6.16 started; log sequence number 19709789; transaction id 6814
            2023-09-05  7:42:03 6 [ERROR] Cannot find index Marvão_uidx1 in InnoDB index dictionary.
            2023-09-05  7:42:03 6 [ERROR] InnoDB indexes are inconsistent with what defined in .frm for table ./test/t3
            2023-09-05  7:42:03 6 [ERROR] InnoDB: Table test/t3 contains 2 indexes inside InnoDB, which is different from the number of indexes 3 defined in the .frm file. See https://mariadb.com/kb/en/innodb-troubleshooting/
            

            I believe that this is a regression due to MDEV-30100. We must durably write the log for committing a DDL transaction.

            marko Marko Mäkelä added a comment - The first server instance after bootstrap executed an ALTER TABLE slightly before it was killed: Thread 54 hit Breakpoint 2, ha_innobase::commit_inplace_alter_table (this=0xc826406c8b0, altered_table=0x7f2b2f087d90, ha_alter_info=0x7f2b2f087cd0, commit=true) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/handler0alter.cc:11093 11093 { 1: m_prebuilt.table.name = {m_name = 0x7aac48017d70 "test/t3"} 3: m_user_thd.query_string = {string = {str = 0xc8264028220 "ALTER TABLE t3 ADD FULLTEXT INDEX IF NOT EXISTS `Marvão_ftidx2` ( col_text ), ADD UNIQUE KEY IF NOT EXISTS `Marvão_uidx1` ( col2 /* 100800 ASC */, col1 /* 100800 DESC */ ) /* E_R Thread2 QNO 332 CO"..., length = 210}, cs = 0x55f97b094520 <my_charset_latin1>} Current event: 433927   Thread 47 received signal SIGKILL, Killed. [Switching to Thread 3460236.3474134] 0x0000000070000002 in syscall_traced () (rr) when Current event: 434122 It looks like the last persistent write (update of log_sys.flushed_to_disk_lsn ) was some time earlier in this thread: #6 0x000055f97a6a09e4 in trx_t::commit_persist (this=this@entry=0x1ac73ffcea80) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/trx/trx0trx.cc:1583 #7 0x000055f97a700611 in trx_t::commit (this=this@entry=0x1ac73ffcea80, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/dict/drop.cc:238 #8 0x000055f97a5e50b1 in commit_unlock_and_unlink (trx=0x1ac73ffcea80) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/handler0alter.cc:4339 #9 0x000055f97a5ec0cc in prepare_inplace_alter_table_dict (ha_alter_info=<optimized out>, altered_table=<optimized out>, old_table=<optimized out>, table_name=<optimized out>, flags=<optimized out>, flags2=<optimized out>, fts_doc_id_col=<optimized out>, add_fts_doc_id=<optimized out>, add_fts_doc_id_idx=<optimized out>) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/handler0alter.cc:7352 #10 0x000055f97a5ee8e2 in ha_innobase::prepare_inplace_alter_table (this=<optimized out>, altered_table=<optimized out>, ha_alter_info=<optimized out>) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/ha_innodb.h:695 #11 0x000055f97a1d802d in mysql_inplace_alter_table (target_mdl_request=0x7f2b2f0885e0, alter_ctx=0x7f2b2f089740, trigger_param=0x7f2b2f088190, ddl_log_state=0x7f2b2f087c70, ha_alter_info=0x7f2b2f087cd0, altered_table=0x7f2b2f087d90, table=0xc826408d4c8, table_list=0xc8264028470, thd=0xc826401dad8) at /data/Server/bb-10.6-MDEV-32068/sql/sql_table.cc:7361 The later call to ha_innobase::commit_inplace_alter_table() returned false (indicating success), while log_sys.lsn=19710260 and log_sys.flushed_to_disk_lsn=19691339 (at the value of the above write). This is definitely wrong. These values remained until the SIGKILL. Because there was no call to innodb_check_version on the subsequent server startup, I suppose that the SQL layer had replaced the .frm file and cleared the ddl_recovery.log . That subsequent recovery reported the following: 2023-09-05 7:41:47 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=42486,42486 … 2023-09-05 7:41:50 0 [Note] InnoDB: 10.6.16 started; log sequence number 19709789; transaction id 6814 … 2023-09-05 7:42:03 6 [ERROR] Cannot find index Marvão_uidx1 in InnoDB index dictionary. 2023-09-05 7:42:03 6 [ERROR] InnoDB indexes are inconsistent with what defined in .frm for table ./test/t3 2023-09-05 7:42:03 6 [ERROR] InnoDB: Table test/t3 contains 2 indexes inside InnoDB, which is different from the number of indexes 3 defined in the .frm file. See https://mariadb.com/kb/en/innodb-troubleshooting/ I believe that this is a regression due to MDEV-30100 . We must durably write the log for committing a DDL transaction.

            The atomic DDL commit that fails to be persisted is here:

            (rr) backtrace
            #0  trx_t::commit_in_memory (mtr=0x7f2b2f087610, this=0x1ac73ffcea80) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/trx/trx0trx.cc:1366
            #1  trx_t::commit_low (this=0x1ac73ffcea80, mtr=0x7f2b2f087610) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/trx/trx0trx.cc:1569
            #2  0x000055f97a6a09e4 in trx_t::commit_persist (this=this@entry=0x1ac73ffcea80) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/trx/trx0trx.cc:1583
            #3  0x000055f97a700611 in trx_t::commit (this=this@entry=0x1ac73ffcea80, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/dict/drop.cc:238
            #4  0x000055f97a5f1ae6 in ha_innobase::commit_inplace_alter_table (this=<optimized out>, altered_table=<optimized out>, ha_alter_info=<optimized out>, commit=<optimized out>)
                at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/handler0alter.cc:11549
            (rr) print mtr->m_commit_lsn
            $1 = 19710260
            

            marko Marko Mäkelä added a comment - The atomic DDL commit that fails to be persisted is here: (rr) backtrace #0 trx_t::commit_in_memory (mtr=0x7f2b2f087610, this=0x1ac73ffcea80) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/trx/trx0trx.cc:1366 #1 trx_t::commit_low (this=0x1ac73ffcea80, mtr=0x7f2b2f087610) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/trx/trx0trx.cc:1569 #2 0x000055f97a6a09e4 in trx_t::commit_persist (this=this@entry=0x1ac73ffcea80) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/trx/trx0trx.cc:1583 #3 0x000055f97a700611 in trx_t::commit (this=this@entry=0x1ac73ffcea80, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/dict/drop.cc:238 #4 0x000055f97a5f1ae6 in ha_innobase::commit_inplace_alter_table (this=<optimized out>, altered_table=<optimized out>, ha_alter_info=<optimized out>, commit=<optimized out>) at /data/Server/bb-10.6-MDEV-32068/storage/innobase/handler/handler0alter.cc:11549 … (rr) print mtr->m_commit_lsn $1 = 19710260

            The reason why the DDL commit was not written is related to MDEV-24341. trx_flush_log_if_needed() is invoking an asynchronous write, and we fail to wait for that to complete before returning from the DDL operation:

              if ((cb.m_param= thd_increment_pending_ops(trx->mysql_thd)))
              {
                cb.m_callback = (void (*)(void *)) thd_decrement_pending_ops;
                log_write_up_to(lsn, flush, false, &cb);
              }
            

            marko Marko Mäkelä added a comment - The reason why the DDL commit was not written is related to MDEV-24341 . trx_flush_log_if_needed() is invoking an asynchronous write, and we fail to wait for that to complete before returning from the DDL operation: if ((cb.m_param= thd_increment_pending_ops(trx->mysql_thd))) { cb.m_callback = ( void (*)( void *)) thd_decrement_pending_ops; log_write_up_to(lsn, flush, false , &cb); }

            It looks like this must have been broken for a longer time. MDEV-24341 was implemented already in MariaDB Server 10.6.0. The last fundamental InnoDB change related to atomic ALTER TABLE (MDEV-25180) was the third part of MDEV-25506 in MariaDB Server 10.6.2.

            wlad confirmed to me that we would either need a call to thd->async_state.wait_for_pending_ops(); after the DDL operation returns from InnoDB, or a synchronous call of log_write_up_to() (without a completion callback) is needed, to ensure the correct durability on DDL.

            marko Marko Mäkelä added a comment - It looks like this must have been broken for a longer time. MDEV-24341 was implemented already in MariaDB Server 10.6.0. The last fundamental InnoDB change related to atomic ALTER TABLE ( MDEV-25180 ) was the third part of MDEV-25506 in MariaDB Server 10.6.2. wlad confirmed to me that we would either need a call to thd->async_state.wait_for_pending_ops(); after the DDL operation returns from InnoDB, or a synchronous call of log_write_up_to() (without a completion callback) is needed, to ensure the correct durability on DDL.

            After all, this does look like a regression due to this preparatory change of MDEV-30100, which would cause the LSN to be 0 for the calls log_write_up_to(commit_lsn, true) in DDL operations.

            In fact, mleich did catch this type of failure while testing MDEV-30100, and that preparatory change was revised accordingly (but insufficiently). I think that we must revert that change:

            diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
            index 751d7567881..e0eb723426f 100644
            --- a/storage/innobase/trx/trx0trx.cc
            +++ b/storage/innobase/trx/trx0trx.cc
            @@ -1238,7 +1238,7 @@ static void trx_flush_log_if_needed(lsn_t lsn, trx_t *trx)
               if (log_sys.get_flushed_lsn() > lsn)
                 return;
             
            -  const bool flush= trx->dict_operation ||
            +  const bool flush=
                 (srv_file_flush_method != SRV_NOSYNC &&
                  (srv_flush_log_at_trx_commit & 1));
             
            

            We can make use of the field trx_t::flush_log_later in all DDL operations. The only DDL operation that does not invoke trx_t::commit(std::vector<pfs_os_file_t> &) is ha_innobase::rename_table().

            marko Marko Mäkelä added a comment - After all, this does look like a regression due to this preparatory change of MDEV-30100 , which would cause the LSN to be 0 for the calls log_write_up_to(commit_lsn, true) in DDL operations. In fact, mleich did catch this type of failure while testing MDEV-30100 , and that preparatory change was revised accordingly (but insufficiently). I think that we must revert that change: diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc index 751d7567881..e0eb723426f 100644 --- a/storage/innobase/trx/trx0trx.cc +++ b/storage/innobase/trx/trx0trx.cc @@ -1238,7 +1238,7 @@ static void trx_flush_log_if_needed(lsn_t lsn, trx_t *trx) if (log_sys.get_flushed_lsn() > lsn) return; - const bool flush= trx->dict_operation || + const bool flush= (srv_file_flush_method != SRV_NOSYNC && (srv_flush_log_at_trx_commit & 1)); We can make use of the field trx_t::flush_log_later in all DDL operations. The only DDL operation that does not invoke trx_t::commit(std::vector<pfs_os_file_t> &) is ha_innobase::rename_table() .

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.