Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13820

trx_id_check() fails during row_log_table_apply()

Details

    Description

      10.3 3722372ae56b7ea90

      mysqld: /data/src/10.3/storage/innobase/include/trx0sys.h:186: bool trx_id_check(const void*, trx_id_t): Assertion `id == 0 || id > trx_id' failed.
      171024 20:59:53 [ERROR] mysqld got signal 6 ;
       
      #7  0x00007f00bc64bee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
      #8  0x00005639db364f32 in trx_id_check (db_trx_id=0x7f00b6258082, trx_id=1287) at /data/src/10.3/storage/innobase/include/trx0sys.h:186
      #9  0x00005639db384c68 in row_log_table_apply_delete (thr=0x7f006c14b160, trx_id_col=1, mrec=0x7f00b40df01f "\200", moffsets=0x7f006c17d230, offsets_heap=0x7f006c0645d0, heap=0x7f006c060540, log=0x7f006c040400, save_ext=0x0, ext_size=0) at /data/src/10.3/storage/innobase/row/row0log.cc:2065
      #10 0x00005639db3869b9 in row_log_table_apply_op (thr=0x7f006c14b160, new_trx_id_col=1, dup=0x7f00b426d560, error=0x7f00b426d30c, offsets_heap=0x7f006c0645d0, heap=0x7f006c060540, mrec=0x7f00b40df01f "\200", mrec_end=0x7f00b40df030 "", offsets=0x7f006c17d230) at /data/src/10.3/storage/innobase/row/row0log.cc:2624
      #11 0x00005639db388ca8 in row_log_table_apply_ops (thr=0x7f006c14b160, dup=0x7f00b426d560, stage=0x7f006c17b350) at /data/src/10.3/storage/innobase/row/row0log.cc:3177
      #12 0x00005639db389267 in row_log_table_apply (thr=0x7f006c14b160, old_table=0x7f006c093d88, table=0x7f006c152bd0, stage=0x7f006c17b350) at /data/src/10.3/storage/innobase/row/row0log.cc:3276
      #13 0x00005639db259e2f in commit_try_rebuild (ha_alter_info=0x7f00b426e520, ctx=0x7f006c015fc8, altered_table=0x7f006c152bd0, old_table=0x7f006c14fbd0, trx=0x7f00b65780d8, table_name=0x7f006c14e20d "t2") at /data/src/10.3/storage/innobase/handler/handler0alter.cc:8301
      #14 0x00005639db254a90 in ha_innobase::commit_inplace_alter_table (this=0x7f006c1507f8, altered_table=0x7f006c152bd0, ha_alter_info=0x7f00b426e520, commit=true) at /data/src/10.3/storage/innobase/handler/handler0alter.cc:9137
      #15 0x00005639daef0c76 in handler::ha_commit_inplace_alter_table (this=0x7f006c1507f8, altered_table=0x7f006c152bd0, ha_alter_info=0x7f00b426e520, commit=true) at /data/src/10.3/sql/handler.cc:4251
      #16 0x00005639dacf7f58 in mysql_inplace_alter_table (thd=0x7f006c000b00, table_list=0x7f006c014ae8, table=0x7f006c14fbd0, altered_table=0x7f006c152bd0, ha_alter_info=0x7f00b426e520, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7f00b426e590, alter_ctx=0x7f00b426f150) at /data/src/10.3/sql/sql_table.cc:7229
      #17 0x00005639dacfd23c in mysql_alter_table (thd=0x7f006c000b00, new_db=0x7f006c0150f8 "test", new_name=0x0, create_info=0x7f00b426fd60, table_list=0x7f006c014ae8, alter_info=0x7f00b426fcb0, order_num=0, order=0x0, ignore=false) at /data/src/10.3/sql/sql_table.cc:9205
      #18 0x00005639dad7be7b in Sql_cmd_alter_table::execute (this=0x7f006c015100, thd=0x7f006c000b00) at /data/src/10.3/sql/sql_alter.cc:331
      #19 0x00005639dac2f172 in mysql_execute_command (thd=0x7f006c000b00) at /data/src/10.3/sql/sql_parse.cc:6236
      #20 0x00005639dac33a1c in mysql_parse (thd=0x7f006c000b00, rawbuf=0x7f006c0149f8 "ALTER TABLE t2 ROW_FORMAT=DYNAMIC", length=33, parser_state=0x7f00b4271610, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:7921
      #21 0x00005639dac211bd in dispatch_command (command=COM_QUERY, thd=0x7f006c000b00, packet=0x7f006c08fce1 "ALTER TABLE t2 ROW_FORMAT=DYNAMIC", packet_length=33, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1819
      #22 0x00005639dac1fc1b in do_command (thd=0x7f006c000b00) at /data/src/10.3/sql/sql_parse.cc:1370
      #23 0x00005639dad769f4 in do_handle_one_connection (connect=0x5639de539bc0) at /data/src/10.3/sql/sql_connect.cc:1418
      #24 0x00005639dad76781 in handle_one_connection (arg=0x5639de539bc0) at /data/src/10.3/sql/sql_connect.cc:1324
      #25 0x00005639db1f0f4c in pfs_spawn_thread (arg=0x5639de5567b0) at /data/src/10.3/storage/perfschema/pfs.cc:1862
      #26 0x00007f00be590494 in start_thread (arg=0x7f00b4272700) at pthread_create.c:333
      #27 0x00007f00bc70893f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      The test case is not deterministic. If it doesn't fail right away, try running with --repeat.

      Also, running with innodb_flush_log_at_trx_commit=2 significantly increased the probability of the race condition in original stress tests while running on disk – once in a few minutes vs once in many hours; and further, running in shm increased the probability even more, to once in a few seconds. Currently the MTR test below fails pretty much every time for me, even on disk and without innodb_flush_log_at_trx_commit=2, so I cannot say whether those options have any effect on it.

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (pk INT PRIMARY KEY, i INT) ENGINE=InnoDB;
      CREATE TABLE t2 (pk INT PRIMARY KEY, i INT) ENGINE=InnoDB;
       
      --connect (con1,localhost,root,,test)
      BEGIN;
      --send
        INSERT INTO t1 VALUES (1,10);
       
      --connection default
      --send
        ALTER TABLE t2 ROW_FORMAT=DYNAMIC;
       
      --connection con1
      --reap
      INSERT INTO t2 VALUES (2,20);
      ROLLBACK;
       
      --connection default
      --reap
       
      # Cleanup
      --disconnect con1
      --connection default
      DROP TABLE t1, t2;
      

      Does not fail on 10.2.
      No visible effect on 10.3 non-debug build.
      The problem rather badly affects upgrade tests on 10.3 (the laste step, when DDL/DML is running on the upgraded server), causes a lot of failures unrelated to upgrade as such.

      Attachments

        Issue Links

          Activity

            This assertion only exists in 10.3.
            The assertion fails on the record that was inserted by

            INSERT INTO t2 VALUES (2,20);
            

            There is a race condition in the test, but in my case this transaction ID was assigned right before the transaction for ALTER TABLE t2 started.

            It looks like the ROLLBACK completed execution. In t2.ibd (and in the buffer pool copy of its root page 5:3) there is no user record. As part of MDEV-11369 we would empty the root page when the table becomes empty. So, there is not even PAGE_FREE garbage for the (2,20).

            The assertion fails on the intermediate copy that is being created for t2.ibd (page 6:3). That record must have been copied during the initial copy phase. At that point, the transaction was active.

            The proper fix would seem to be to relax this assertion. After all, we are applying the delete operation related to rolling back the INSERT.

            Currently row_merge_read_clustered_index() only resets the DB_TRX_ID for records that belonged to transactions that were committed before the ALTER TABLE started:

            		if (old_table == new_table) {
            			/* Do not bother touching DB_TRX_ID,DB_ROLL_PTR
            			because they are not going to be written into
            			secondary indexes. */
            		} else if (rec_trx_id < trx->id) {
            			/* Reset the DB_TRX_ID,DB_ROLL_PTR of old rows
            			for which history is not going to be
            			available after the rebuild operation.
            			This essentially mimics row_purge_reset_trx_id(). */
            			row->fields[new_trx_id_col].data
            				= const_cast<byte*>(reset_trx_id);
            			row->fields[new_trx_id_col + 1].data
            				= const_cast<byte*>(reset_trx_id
            						    + DATA_TRX_ID_LEN);
            		}
            

            marko Marko Mäkelä added a comment - This assertion only exists in 10.3. The assertion fails on the record that was inserted by INSERT INTO t2 VALUES (2,20); There is a race condition in the test, but in my case this transaction ID was assigned right before the transaction for ALTER TABLE t2 started. It looks like the ROLLBACK completed execution. In t2.ibd (and in the buffer pool copy of its root page 5:3) there is no user record. As part of MDEV-11369 we would empty the root page when the table becomes empty. So, there is not even PAGE_FREE garbage for the (2,20). The assertion fails on the intermediate copy that is being created for t2.ibd (page 6:3). That record must have been copied during the initial copy phase. At that point, the transaction was active. The proper fix would seem to be to relax this assertion. After all, we are applying the delete operation related to rolling back the INSERT. Currently row_merge_read_clustered_index() only resets the DB_TRX_ID for records that belonged to transactions that were committed before the ALTER TABLE started: if (old_table == new_table) { /* Do not bother touching DB_TRX_ID,DB_ROLL_PTR because they are not going to be written into secondary indexes. */ } else if (rec_trx_id < trx->id) { /* Reset the DB_TRX_ID,DB_ROLL_PTR of old rows for which history is not going to be available after the rebuild operation. This essentially mimics row_purge_reset_trx_id(). */ row->fields[new_trx_id_col].data = const_cast <byte*>(reset_trx_id); row->fields[new_trx_id_col + 1].data = const_cast <byte*>(reset_trx_id + DATA_TRX_ID_LEN); }

            We cannot simply relax the assertion.
            mrec_trx_id is 0 in this case, and we are comparing it to the rolled-back rec_trx_id. This comparison would fail, causing the delete operation to be skipped:

            		if (memcmp(mrec_trx_id, rec_trx_id,
            			   DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN)) {
            			/* The ROW_T_DELETE was logged for a different
            			PRIMARY KEY,DB_TRX_ID,DB_ROLL_PTR.
            			This is possible if a ROW_T_INSERT was skipped
            			or a ROW_T_UPDATE was interpreted as ROW_T_DELETE
            			because some BLOBs were missing due to
            			(1) rolling back the initial insert, or
            			(2) purging the BLOB for a later ROW_T_DELETE
            			(3) purging 'old values' for a later ROW_T_UPDATE
            			or ROW_T_DELETE. */
            			ut_ad(!log->same_pk);
            			goto all_done;
            		}
            

            Special consideration is needed when the operation involves DROP PRIMARY KEY, ADD PRIMARY KEY.
            This could affect the correctness of table-rebuilding online ALTER in non-debug builds.

            marko Marko Mäkelä added a comment - We cannot simply relax the assertion. mrec_trx_id is 0 in this case, and we are comparing it to the rolled-back rec_trx_id. This comparison would fail, causing the delete operation to be skipped: if ( memcmp (mrec_trx_id, rec_trx_id, DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN)) { /* The ROW_T_DELETE was logged for a different PRIMARY KEY,DB_TRX_ID,DB_ROLL_PTR. This is possible if a ROW_T_INSERT was skipped or a ROW_T_UPDATE was interpreted as ROW_T_DELETE because some BLOBs were missing due to (1) rolling back the initial insert, or (2) purging the BLOB for a later ROW_T_DELETE (3) purging 'old values' for a later ROW_T_UPDATE or ROW_T_DELETE. */ ut_ad(! log ->same_pk); goto all_done; } Special consideration is needed when the operation involves DROP PRIMARY KEY, ADD PRIMARY KEY. This could affect the correctness of table-rebuilding online ALTER in non-debug builds.

            Here is a more deterministic version of the test. It is important that the transaction that did performed concurrent modifications was started (and a transaction ID assigned) before the ALTER TABLE was started:

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
             
            CREATE TABLE t1 (pk INT PRIMARY KEY) ENGINE=InnoDB;
            CREATE TABLE t2 (pk INT PRIMARY KEY) ENGINE=InnoDB;
             
            BEGIN;
            INSERT INTO t2 SET pk=1;
             
            --connect (con1,localhost,root,,test)
            SET DEBUG_SYNC='alter_table_inplace_after_lock_downgrade SIGNAL prepared WAIT_FOR logged';
            send ALTER TABLE t1 FORCE;
             
            --connection default
            SET DEBUG_SYNC='now WAIT_FOR prepared';
            INSERT INTO t1 VALUES (1);
            SET DEBUG_SYNC='now SIGNAL logged';
            ROLLBACK;
             
            --connection con1
            reap;
            --disconnect con1
            --connection default
            SET DEBUG_SYNC='RESET';
            DROP TABLE t1;
            

            This certainly highlights a thinking mistake that I made when developing the following fix:

            MDEV-13654 Various crashes due to DB_TRX_ID mismatch in table-rebuilding ALTER TABLE…LOCK=NONE

            The assertion trx_id_check() was introduced in the above fix. It is valid by design in row_merge_read_clustered_index(), because there we are performing REPEATABLE READ, so we will not see any records that were written after the ALTER TABLE. Also, thanks to the exclusive lock during ha_innobase::prepare_inplace_alter_table(), there cannot exist any older transactions that have already modified the table at the time the ALTER TABLE started the copying. So, at that point, REPEATABLE READ cannot see rows from active but earlier-started transactions.
            In row_log_table_apply() it is a different matter. We can indeed see changes by older-started transactions, and ideally we should reset these DB_TRX_ID to 0, because purge would not be run on the rebuilt table. This may be challenging at least in the case of ADD PRIMARY KEY, because there records in the rebuilt table would be identified by both PRIMARY KEY and DB_TRX_ID.

            marko Marko Mäkelä added a comment - Here is a more deterministic version of the test. It is important that the transaction that did performed concurrent modifications was started (and a transaction ID assigned) before the ALTER TABLE was started: --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc   CREATE TABLE t1 (pk INT PRIMARY KEY ) ENGINE=InnoDB; CREATE TABLE t2 (pk INT PRIMARY KEY ) ENGINE=InnoDB;   BEGIN ; INSERT INTO t2 SET pk=1;   --connect (con1,localhost,root,,test) SET DEBUG_SYNC= 'alter_table_inplace_after_lock_downgrade SIGNAL prepared WAIT_FOR logged' ; send ALTER TABLE t1 FORCE ;   --connection default SET DEBUG_SYNC= 'now WAIT_FOR prepared' ; INSERT INTO t1 VALUES (1); SET DEBUG_SYNC= 'now SIGNAL logged' ; ROLLBACK ;   --connection con1 reap; --disconnect con1 --connection default SET DEBUG_SYNC= 'RESET' ; DROP TABLE t1; This certainly highlights a thinking mistake that I made when developing the following fix: MDEV-13654 Various crashes due to DB_TRX_ID mismatch in table-rebuilding ALTER TABLE…LOCK=NONE The assertion trx_id_check() was introduced in the above fix. It is valid by design in row_merge_read_clustered_index(), because there we are performing REPEATABLE READ, so we will not see any records that were written after the ALTER TABLE. Also, thanks to the exclusive lock during ha_innobase::prepare_inplace_alter_table(), there cannot exist any older transactions that have already modified the table at the time the ALTER TABLE started the copying. So, at that point, REPEATABLE READ cannot see rows from active but earlier-started transactions. In row_log_table_apply() it is a different matter. We can indeed see changes by older-started transactions, and ideally we should reset these DB_TRX_ID to 0, because purge would not be run on the rebuilt table. This may be challenging at least in the case of ADD PRIMARY KEY, because there records in the rebuilt table would be identified by both PRIMARY KEY and DB_TRX_ID.

            When logging ROW_T_INSERT or ROW_T_UPDATE records, we did not normalize the DB_TRX_ID of the current transaction into 0 if the current transaction had started (modifying other tables) before the ALTER TABLE started.
            Starting with MDEV-13654, this normalization was already performed for ROW_T_DELETE, and for all operations with ADD PRIMARY KEY, in row_log_table_get_pk().

            marko Marko Mäkelä added a comment - When logging ROW_T_INSERT or ROW_T_UPDATE records, we did not normalize the DB_TRX_ID of the current transaction into 0 if the current transaction had started (modifying other tables) before the ALTER TABLE started. Starting with MDEV-13654 , this normalization was already performed for ROW_T_DELETE, and for all operations with ADD PRIMARY KEY, in row_log_table_get_pk().

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.