Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13654

Various crashes due to DB_TRX_ID mismatch in table-rebuilding ALTER TABLE…LOCK=NONE

Details

    Description

      10.3 578b2b05b8f734

      mysqld: /data/src/10.3/storage/innobase/row/row0log.cc:1961: dberr_t row_log_table_apply_delete(que_thr_t*, ulint, const mrec_t*, const ulint*, mem_heap_t*, mem_heap_t*, const row_log_t*, const row_ext_t*, ulint): Assertion `!log->same_pk' failed.
      170826  3:29:10 [ERROR] mysqld got signal 6 ;
       
      #7  0x00007f9af1281ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
      #8  0x000055a1b98d56b6 in row_log_table_apply_delete (thr=0x7f9a9402d290, trx_id_col=1, mrec=0x7f9adc1aa021 "", moffsets=0x7f9a94030300, offsets_heap=0x7f9a94055ae0, heap=0x7f9a94031590, log=0x7f9a940359c0, save_ext=0x0, ext_size=0) at /data/src/10.3/storage/innobase/row/row0log.cc:1961
      #9  0x000055a1b98d6efc in row_log_table_apply_op (thr=0x7f9a9402d290, new_trx_id_col=1, dup=0x7f9adc5ef210, error=0x7f9adc5eefc4, offsets_heap=0x7f9a94055ae0, heap=0x7f9a94031590, mrec=0x7f9adc1aa021 "", mrec_end=0x7f9adc1aa034 "", offsets=0x7f9a94030300) at /data/src/10.3/storage/innobase/row/row0log.cc:2489
      #10 0x000055a1b98d8aa9 in row_log_table_apply_ops (thr=0x7f9a9402d290, dup=0x7f9adc5ef210, stage=0x7f9a94030240) at /data/src/10.3/storage/innobase/row/row0log.cc:3015
      #11 0x000055a1b98d8fd7 in row_log_table_apply (thr=0x7f9a9402d290, old_table=0x7f9a9002d1f8, table=0x7f9a94028d50, stage=0x7f9a94030240) at /data/src/10.3/storage/innobase/row/row0log.cc:3114
      #12 0x000055a1b97b62d2 in ha_innobase::inplace_alter_table (this=0x7f9a9401b2a8, altered_table=0x7f9a94028d50, ha_alter_info=0x7f9adc5ef540) at /data/src/10.3/storage/innobase/handler/handler0alter.cc:6491
      #13 0x000055a1b928c7e3 in handler::ha_inplace_alter_table (this=0x7f9a9401b2a8, altered_table=0x7f9a94028d50, ha_alter_info=0x7f9adc5ef540) at /data/src/10.3/sql/handler.h:3830
      #14 0x000055a1b9283e35 in mysql_inplace_alter_table (thd=0x7f9a94000b00, table_list=0x7f9a94013648, table=0x7f9a9401a680, altered_table=0x7f9a94028d50, ha_alter_info=0x7f9adc5ef540, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7f9adc5ef5b0, alter_ctx=0x7f9adc5f0170) at /data/src/10.3/sql/sql_table.cc:7179
      #15 0x000055a1b9288d48 in mysql_alter_table (thd=0x7f9a94000b00, new_db=0x7f9a94013c58 "test", new_name=0x0, create_info=0x7f9adc5f0d80, table_list=0x7f9a94013648, alter_info=0x7f9adc5f0cd0, order_num=0, order=0x0, ignore=false) at /data/src/10.3/sql/sql_table.cc:9119
      #16 0x000055a1b9304bd6 in Sql_cmd_alter_table::execute (this=0x7f9a94013c88, thd=0x7f9a94000b00) at /data/src/10.3/sql/sql_alter.cc:331
      #17 0x000055a1b91bdbd5 in mysql_execute_command (thd=0x7f9a94000b00) at /data/src/10.3/sql/sql_parse.cc:6240
      #18 0x000055a1b91c2360 in mysql_parse (thd=0x7f9a94000b00, rawbuf=0x7f9a94013558 "ALTER TABLE t1 DROP COLUMN extra", length=32, parser_state=0x7f9adc5f2610, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:7921
      #19 0x000055a1b91b05a7 in dispatch_command (command=COM_QUERY, thd=0x7f9a94000b00, packet=0x7f9a9400ae21 "ALTER TABLE t1 DROP COLUMN extra", packet_length=32, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1821
      #20 0x000055a1b91af049 in do_command (thd=0x7f9a94000b00) at /data/src/10.3/sql/sql_parse.cc:1372
      #21 0x000055a1b92ff863 in do_handle_one_connection (connect=0x55a1bc22c550) at /data/src/10.3/sql/sql_connect.cc:1418
      #22 0x000055a1b92ff5f0 in handle_one_connection (arg=0x55a1bc22c550) at /data/src/10.3/sql/sql_connect.cc:1324
      #23 0x000055a1b975e099 in pfs_spawn_thread (arg=0x55a1bc2355f0) at /data/src/10.3/storage/perfschema/pfs.cc:1862
      #24 0x00007f9af31c6494 in start_thread (arg=0x7f9adc5f3700) at pthread_create.c:333
      #25 0x00007f9af133e93f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      NOTE: This test case is for reproducing only, don't put it into the regression suite!

      --source include/have_innodb.inc
       
      --let $run= 100
       
      CREATE TABLE t1 (col1 INT, col2 TEXT) ENGINE = InnoDB;
       
      --connect (con1,localhost,root,,test)
      --connect (con2,localhost,root,,test)
       
      while ($run)
      {
        --connection con1
        ALTER TABLE t1 ADD COLUMN extra INT;
        INSERT INTO t1 (col1,col2) VALUES (1,'foo'), (2,'bar');
        --send
          UPDATE t1 SET extra = col1;
          
        --connection con2
        --send
          ALTER TABLE t1 DROP COLUMN extra;
       
        --connection con1
        --reap
          DELETE FROM t1;
        
        --connection con2
        --reap
        
        --dec $run
      }
       
      # Cleanup
      --disconnect con1
      --disconnect con2
       
      --connection default
      DROP TABLE t1;
      

      Attachments

        Issue Links

          Activity

            This is definitely caused by MDEV-12288 or MDEV-13536, and the ALTER TABLE…LOCK=NONE logic needs to be revised.

            There are a few options:

            1. Suspend the purge during online ALTER TABLE (and remove row_log_table_blob_t which would become dead code). Drawback: buildup of lots of purge lag during a massive online ALTER TABLE
            2. Introduce something similar to row_log_table_blob_t to keep track of (PRIMARY KEY,DB_TRX_ID) that have been reset during purge.
            3. Carefully review the logic, and treat the value DB_TRX_ID=0 in the record as something that matches a no-longer-existing DB_TRX_ID in the online_log. Or, simply map no-longer-existing DB_TRX_ID to 0 in row_merge_read_clustered_index() and when writing the online_log.

            elenst, can you observe any other failures in row0log.cc when there are no concurrent DELETE operations?

            marko Marko Mäkelä added a comment - This is definitely caused by MDEV-12288 or MDEV-13536 , and the ALTER TABLE…LOCK=NONE logic needs to be revised. There are a few options: Suspend the purge during online ALTER TABLE (and remove row_log_table_blob_t which would become dead code). Drawback: buildup of lots of purge lag during a massive online ALTER TABLE Introduce something similar to row_log_table_blob_t to keep track of (PRIMARY KEY,DB_TRX_ID) that have been reset during purge. Carefully review the logic, and treat the value DB_TRX_ID=0 in the record as something that matches a no-longer-existing DB_TRX_ID in the online_log. Or, simply map no-longer-existing DB_TRX_ID to 0 in row_merge_read_clustered_index() and when writing the online_log. elenst , can you observe any other failures in row0log.cc when there are no concurrent DELETE operations?
            elenst Elena Stepanova added a comment - - edited

            marko,

            > Elena Stepanova, can you observe any other failures in row0log.cc when there are no concurrent DELETE operations?

            I haven't yet, but there hasn't been enough testing yet to say for sure there are none. I'll keep watching.

            A side note (unrelated to MDEV-12288 and MDEV-13536, but related to ha_innobase::inplace_alter_table) – testing around this area regularly hits my everlasting draft MDEV-10706 , which remains draft because so far I couldn't make up even a marginally reliable test case, although I hit the failure on average once out of 20-30 times in concurrent tests. If you encounter it too, I can confirm with certainty that it's not a new issue.

            elenst Elena Stepanova added a comment - - edited marko , > Elena Stepanova, can you observe any other failures in row0log.cc when there are no concurrent DELETE operations? I haven't yet, but there hasn't been enough testing yet to say for sure there are none. I'll keep watching. A side note (unrelated to MDEV-12288 and MDEV-13536 , but related to ha_innobase::inplace_alter_table ) – testing around this area regularly hits my everlasting draft MDEV-10706 , which remains draft because so far I couldn't make up even a marginally reliable test case, although I hit the failure on average once out of 20-30 times in concurrent tests. If you encounter it too, I can confirm with certainty that it's not a new issue.

            The innodb.innodb-table-online test that I imported into 10.0 (and merged up to 10.2 so far) as part of MDEV-13625 is repeating the same problem, for applying an update instead of an insert:

            			/* The ROW_T_UPDATE was logged for a different
            			DB_TRX_ID,DB_ROLL_PTR. This is possible if an
            			earlier ROW_T_INSERT or ROW_T_UPDATE was diverted
            			because some BLOBs were missing due to rolling
            			back the initial insert or due to purging
            			the old BLOB values of an update. */
            			ut_ad(log->blobs);
            

            CURRENT_TEST: innodb.innodb-table-online
            mysqltest: At line 274: query 'reap' failed: 2013: Lost connection to MySQL server during query
            

            marko Marko Mäkelä added a comment - The innodb.innodb-table-online test that I imported into 10.0 (and merged up to 10.2 so far) as part of MDEV-13625 is repeating the same problem, for applying an update instead of an insert: /* The ROW_T_UPDATE was logged for a different DB_TRX_ID,DB_ROLL_PTR. This is possible if an earlier ROW_T_INSERT or ROW_T_UPDATE was diverted because some BLOBs were missing due to rolling back the initial insert or due to purging the old BLOB values of an update. */ ut_ad(log->blobs); CURRENT_TEST: innodb.innodb-table-online mysqltest: At line 274: query 'reap' failed: 2013: Lost connection to MySQL server during query

            I made a simple change that makes row_merge_read_clustered_index() reset DB_TRX_ID,DB_ROLL_PTR in the row->fields[] when the history is not needed (when the DB_TRX_ID refers to a no-longer-active transaction).
            With this change, innodb.innodb-table-online passes and the provided test case does not crash.

            The remaining question is: Do the callers of row_log_table_open() need any adjustment?
            That is, can purge reset DB_TRX_ID of transactions that were active when the ALTER TABLE transaction passed ha_innobase::prepare_inplace_alter_table()?
            The answer is a firm NO, because purge is not allowed to discard any history that is visible to currently active transactions.

            A further observation is that ha_innobase::prepare_inplace_alter_table() acts as a barrier. During the execution of this function, both MDL_EXCLUSIVE lock on the table name and a LOCK_X on the dict_table_t will be held for some time. These locks cannot be granted until any concurrent transactions that accessed these tables (the MDL_EXCLUSIVE blocks even non-locking MVCC reads). So, any active transaction that was observed during ha_innobase::inplace_alter_table() in row_merge_read_clustered_index() or thereafter, would necessarily have been started after ha_innobase::prepare_inplace_alter_table() returned and the exclusive lock was downgraded.

            Note: After a table-rebuilding ALTER TABLE…LOCK=NONE, it could be unavoidable to have some nonzero DB_TRX_ID columns in the table. Especially with DROP PRIMARY KEY, ADD PRIMARY KEY, the apply logic partly identifies the rows by the combination of PRIMARY KEY and DB_TRX_ID. These nonzero DB_TRX_ID would necessarily refer to concurrent DML operations that were started during ha_innobase::inplace_alter_table().

            marko Marko Mäkelä added a comment - I made a simple change that makes row_merge_read_clustered_index() reset DB_TRX_ID,DB_ROLL_PTR in the row->fields[] when the history is not needed (when the DB_TRX_ID refers to a no-longer-active transaction). With this change, innodb.innodb-table-online passes and the provided test case does not crash. The remaining question is: Do the callers of row_log_table_open() need any adjustment? That is, can purge reset DB_TRX_ID of transactions that were active when the ALTER TABLE transaction passed ha_innobase::prepare_inplace_alter_table()? The answer is a firm NO, because purge is not allowed to discard any history that is visible to currently active transactions. A further observation is that ha_innobase::prepare_inplace_alter_table() acts as a barrier. During the execution of this function, both MDL_EXCLUSIVE lock on the table name and a LOCK_X on the dict_table_t will be held for some time. These locks cannot be granted until any concurrent transactions that accessed these tables (the MDL_EXCLUSIVE blocks even non-locking MVCC reads). So, any active transaction that was observed during ha_innobase::inplace_alter_table() in row_merge_read_clustered_index() or thereafter, would necessarily have been started after ha_innobase::prepare_inplace_alter_table() returned and the exclusive lock was downgraded. Note: After a table-rebuilding ALTER TABLE…LOCK=NONE, it could be unavoidable to have some nonzero DB_TRX_ID columns in the table. Especially with DROP PRIMARY KEY, ADD PRIMARY KEY, the apply logic partly identifies the rows by the combination of PRIMARY KEY and DB_TRX_ID. These nonzero DB_TRX_ID would necessarily refer to concurrent DML operations that were started during ha_innobase::inplace_alter_table().

            One more consideration is what happens in an online table-rebuilding ALTER TABLE after upgrading from MariaDB 10.3.0 or earlier.
            In the old data file, we would necessarily have some clustered index records where a non-zero DB_TRX_ID column refers to a purged transaction.

            With the above mentioned change to row_merge_read_clustered_index(), the initial rebuild of the table would reset DB_TRX_ID,DB_ROLL_PTR in the table.
            But, what would happen on row_log_table_apply()?
            The functions row_log_table_update() and row_log_table_delete() would be writing these non-canonical column values to the online_log, causing a mismatch when applying the log.

            For row_log_table_delete(), the fix seems simple: treat the parameter sys=NULL as a request to reset the DB_TRX_ID,DB_ROLL_PTR in the log record. The only caller with non-NULL value is the ROLLBACK of an UPDATE, which needs to reset the original value. (As noted above, such rollback can only be a rollback of a transaction that was started after ha_innobase::prepare_inplace_alter_table() returned.) If the update that is now being rolled back was operating on a record whose DB_TRX_ID referred to a purged transaction (from before the upgrade), then we should reset the fields in the log record.

            For row_log_table_update(), there is the parameter old_pk=row_log_table_get_pk() that is used during ADD PRIMARY KEY. The DB_TRX_ID in the old_pk parameter could refer to a purged transaction. If that is the case, we should reset the DB_TRX_ID,DB_ROLL_PTR in old_pk.

            With these changes to the logging, row_log_table_apply() should always find a match between the rebuilt table (which would contain DB_TRX_ID=0 most of the time) and the online_log.

            Again, the problem only affects a table-rebuilding ALTER TABLE…LOCK=NONE. There is also row_log_apply() for ADD INDEX and ADD UNIQUE INDEX when the table is not being rebuilt, and that logging is unaffected by this, because secondary indexes do not contain DB_TRX_ID columns (only the PAGE_MAX_TRX_ID in leaf pages).

            marko Marko Mäkelä added a comment - One more consideration is what happens in an online table-rebuilding ALTER TABLE after upgrading from MariaDB 10.3.0 or earlier. In the old data file, we would necessarily have some clustered index records where a non-zero DB_TRX_ID column refers to a purged transaction. With the above mentioned change to row_merge_read_clustered_index(), the initial rebuild of the table would reset DB_TRX_ID,DB_ROLL_PTR in the table. But, what would happen on row_log_table_apply()? The functions row_log_table_update() and row_log_table_delete() would be writing these non-canonical column values to the online_log, causing a mismatch when applying the log. For row_log_table_delete(), the fix seems simple: treat the parameter sys=NULL as a request to reset the DB_TRX_ID,DB_ROLL_PTR in the log record. The only caller with non-NULL value is the ROLLBACK of an UPDATE, which needs to reset the original value. (As noted above, such rollback can only be a rollback of a transaction that was started after ha_innobase::prepare_inplace_alter_table() returned.) If the update that is now being rolled back was operating on a record whose DB_TRX_ID referred to a purged transaction (from before the upgrade), then we should reset the fields in the log record. For row_log_table_update(), there is the parameter old_pk=row_log_table_get_pk() that is used during ADD PRIMARY KEY. The DB_TRX_ID in the old_pk parameter could refer to a purged transaction. If that is the case, we should reset the DB_TRX_ID,DB_ROLL_PTR in old_pk. With these changes to the logging, row_log_table_apply() should always find a match between the rebuilt table (which would contain DB_TRX_ID=0 most of the time) and the online_log. Again, the problem only affects a table-rebuilding ALTER TABLE…LOCK=NONE. There is also row_log_apply() for ADD INDEX and ADD UNIQUE INDEX when the table is not being rebuilt, and that logging is unaffected by this, because secondary indexes do not contain DB_TRX_ID columns (only the PAGE_MAX_TRX_ID in leaf pages).
            marko Marko Mäkelä added a comment - bb-10.3-marko

            ok to push.

            jplindst Jan Lindström (Inactive) added a comment - ok to push.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.