Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22076

InnoDB: Failing assertion: !cursor->index->is_committed() or Assertion `0' in row_purge_remove_sec_if_poss_leaf after DML/DDL with XA

Details

    Description

      Notes:

      • Given the history of false-positives in attempts to reproduce MDEV-14643 / MDEV-9663, I file this one separately. If we are lucky this time and turns out to be the problem that we have been chasing for within MDEV-14643, feel free to close as a duplicate.
      • The test case is highly non-deterministic, run with high --repeat=N. It currently fails for me on average within ~100 attempts, but it can vary on different machines and builds. Also, it is not minimal, it can certainly be reduced further, but it seems to decrease the probability of hitting the failure, so I'm keeping it at is. Hopefully it will be analyzable via an rr profile.
      • I am not sure whether XA is a guilty part in this scenario, but apparently it allows to create the kind of locking situation that's necessary. Maybe it can be eliminated after the development analysis.
      • Please note that !cursor->index->is_committed() only happens on non-debug builds. Debug builds fail with a different assertion. Both are quoted below.

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (
        pk integer auto_increment,
        col_int int,
        col_int_key int,
        primary key (pk)
      ) ENGINE=InnoDB;
       
      INSERT INTO t1 VALUES (NULL,0,2),(NULL,0, NULL);
       
      CREATE TABLE t2 (t TIMESTAMP NULL) ENGINE=InnoDB;
      XA  START 'xid';
      DELETE FROM t1;
      --error ER_XAER_RMFAIL
      ALTER TABLE x;
      XA END 'xid';
       
      --connect (con1,localhost,root,,test)
      --send
      ALTER TABLE t1 ADD INDEX (pk);
       
      --connection default
      --error ER_XAER_RMFAIL
      UPDATE x SET a = 1;
      --error ER_XAER_RMFAIL
      UPDATE x SET a = 1 WHERE b = ( SELECT a FROM xx WHERE b = 1 );
      XA PREPARE 'xid';
      --error ER_XAER_RMFAIL
      DELETE FROM x WHERE a IN ( SELECT a FROM xx WHERE b = 1 );
      --error ER_XAER_RMFAIL
      REPLACE INTO x (a,b) VALUES (1,1),(1,1);
      XA COMMIT 'xid';
       
      --connection con1
      --reap
      SELECT * FROM t1;
      REPLACE INTO t1 VALUES (1,1,1),(2,3,3);
       
      # Cleanup
      --disconnect con1
      --connection default
      DROP TABLE t1, t2;
      

      10.4 RelWithDebInfo ba679ae52f

      2020-03-29 21:24:18 1 [ERROR] InnoDB: tried to purge non-delete-marked record in index `pk` of table `test`.`t1`: tuple: TUPLE (info_bits=0, 1 fields): {[4]    (0x80000002)}, record: COMPACT RECORD(info_bits=0, 1 fields): {[4]    (0x80000002)}
      2020-03-29 21:24:18 4 [ERROR] InnoDB: tried to purge non-delete-marked record in index `pk` of table `test`.`t1`: tuple: TUPLE (info_bits=0, 1 fields): {[4]    (0x80000001)}, record: COMPACT RECORD(info_bits=0, 1 fields): {[4]    (0x80000001)}
      2020-03-29 21:24:18 0x7f4b3e254700  InnoDB: Assertion failure in file /data/src/10.4/storage/innobase/row/row0ins.cc line 263
      InnoDB: Failing assertion: !cursor->index->is_committed()
       
      #6  0x00005603908facb7 in ut_dbg_assertion_failed (expr=expr@entry=0x56039127a188 "!cursor->index->is_committed()", file=file@entry=0x560391279c30 "/data/src/10.4/storage/innobase/row/row0ins.cc", line=line@entry=263) at /data/src/10.4/storage/innobase/ut/ut0dbg.cc:60
      #7  0x00005603908ea580 in row_ins_sec_index_entry_by_modify (mtr=0x7f4b3e24f9e0, thr=0x7f4af401f558, entry=<optimized out>, heap=0x7f4af401c7d0, offsets_heap=<optimized out>, offsets=0x7f4b3e24eac8, cursor=0x7f4b3e24ebd0, mode=2, flags=0) at /data/src/10.4/storage/innobase/row/row0ins.cc:263
      #8  row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7f4af401b9e8, offsets_heap=<optimized out>, heap=0x7f4af401c7d0, entry=<optimized out>, trx_id=<optimized out>, thr=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3099
      #9  0x0000560390df47c8 in row_ins_sec_index_entry (index=0x7f4af401b9e8, entry=0x7f4af012c2a0, thr=0x7f4af401f558, check_foreign=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3314
      #10 0x0000560390df4e2a in row_ins_index_entry (thr=0x7f4af401f558, entry=<optimized out>, index=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3361
      #11 row_ins_index_entry_step (thr=0x7f4af401f558, node=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3528
      #12 row_ins (thr=0x7f4af401f558, node=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3664
      #13 row_ins_step (thr=thr@entry=0x7f4af401f558) at /data/src/10.4/storage/innobase/row/row0ins.cc:3807
      #14 0x0000560390e03718 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x7f4af012d3a8 "\371\001", prebuilt=<optimized out>, ins_mode=<optimized out>) at /data/src/10.4/storage/innobase/row/row0mysql.cc:1463
      #15 0x0000560390d5f42d in ha_innobase::write_row (this=0x7f4af401e340, record=0x7f4af012d3a8 "\371\001") at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:8035
      #16 0x0000560390be50e5 in handler::ha_write_row (this=0x7f4af401e340, buf=0x7f4af012d3a8 "\371\001") at /data/src/10.4/sql/handler.cc:6677
      #17 0x00005603909ba44d in write_record (thd=thd@entry=0x7f4af4000c08, table=table@entry=0x7f4af401d548, info=info@entry=0x7f4b3e2508b0) at /data/src/10.4/sql/sql_insert.cc:1732
      #18 0x00005603909c0896 in mysql_insert (thd=thd@entry=0x7f4af4000c08, table_list=<optimized out>, fields=..., values_list=..., update_fields=..., update_values=..., duplic=<optimized out>, ignore=<optimized out>) at /data/src/10.4/sql/sql_insert.cc:1078
      #19 0x00005603909ec35a in mysql_execute_command (thd=0x7f4af4000c08) at /data/src/10.4/sql/sql_parse.cc:4528
      #20 0x00005603909f2fd8 in mysql_parse (thd=thd@entry=0x7f4af4000c08, rawbuf=<optimized out>, length=38, parser_state=parser_state@entry=0x7f4b3e253190, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.4/sql/sql_parse.cc:7900
      #21 0x00005603909f52af in dispatch_command (command=COM_QUERY, thd=0x7f4af4000c08, packet=<optimized out>, packet_length=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /data/src/10.4/sql/sql_class.h:1168
      #22 0x00005603909f69df in do_command (thd=0x7f4af4000c08) at /data/src/10.4/sql/sql_parse.cc:1359
      #23 0x0000560390aca8f4 in do_handle_one_connection (connect=connect@entry=0x560392bdba18) at /data/src/10.4/sql/sql_connect.cc:1412
      #24 0x0000560390aca9e4 in handle_one_connection (arg=arg@entry=0x560392bdba18) at /data/src/10.4/sql/sql_connect.cc:1316
      #25 0x00005603910242bf in pfs_spawn_thread (arg=0x560392b64228) at /data/src/10.4/storage/perfschema/pfs.cc:1869
      #26 0x00007f4b4998bfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
      #27 0x00007f4b491da4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      10.4 debug 9eae063e

      2020-03-29 21:25:45 1 [ERROR] InnoDB: tried to purge non-delete-marked record in index `pk` of table `test`.`t1`: tuple: TUPLE (info_bits=0, 1 fields): {[4]    (0x80000002)}, record: COMPACT RECORD(info_bits=0, 1 fields): {[4]    (0x80000002)}
      mysqld: /data/src/10.4/storage/innobase/row/row0purge.cc:600: bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*): Assertion `0' failed.
      200329 21:25:45 [ERROR] mysqld got signal 6 ;
       
      #7  0x00007f7b857a9102 in __GI___assert_fail (assertion=0x559b1abf11aa "0", file=0x559b1abf1ac0 "/data/src/10.4/storage/innobase/row/row0purge.cc", line=600, function=0x559b1abf2da0 <row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, dtuple_t const*)::__PRETTY_FUNCTION__> "bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*)") at assert.c:101
      #8  0x0000559b1a3d0fd6 in row_purge_remove_sec_if_poss_leaf (node=0x559b1dbda670, index=0x7f7b2c03a550, entry=0x7f7b58016280) at /data/src/10.4/storage/innobase/row/row0purge.cc:600
      #9  0x0000559b1a3d1352 in row_purge_remove_sec_if_poss (node=0x559b1dbda670, index=0x7f7b2c03a550, entry=0x7f7b58016280) at /data/src/10.4/storage/innobase/row/row0purge.cc:696
      #10 0x0000559b1a3d1577 in row_purge_del_mark (node=0x559b1dbda670) at /data/src/10.4/storage/innobase/row/row0purge.cc:770
      #11 0x0000559b1a3d318c in row_purge_record_func (node=0x559b1dbda670, undo_rec=0x559b1dbdb120 "", thr=0x559b1dbda5b0, updated_extern=false) at /data/src/10.4/storage/innobase/row/row0purge.cc:1196
      #12 0x0000559b1a3d349a in row_purge (node=0x559b1dbda670, undo_rec=0x559b1dbdb120 "", thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/row/row0purge.cc:1263
      #13 0x0000559b1a3d3845 in row_purge_step (thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/row/row0purge.cc:1342
      #14 0x0000559b1a33f73a in que_thr_step (thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/que/que0que.cc:1037
      #15 0x0000559b1a33f9d1 in que_run_threads_low (thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/que/que0que.cc:1099
      #16 0x0000559b1a33fc1f in que_run_threads (thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/que/que0que.cc:1139
      #17 0x0000559b1a44ab57 in trx_purge (n_purge_threads=4, truncate=false, slot=0x559b1b22bdd0 <srv_sys+592>) at /data/src/10.4/storage/innobase/trx/trx0purge.cc:1322
      #18 0x0000559b1a41c64c in srv_do_purge (n_total_purged=0x7f7b62ffceb8, slot=0x559b1b22bdd0 <srv_sys+592>) at /data/src/10.4/storage/innobase/srv/srv0srv.cc:2612
      #19 0x0000559b1a41cb95 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.4/storage/innobase/srv/srv0srv.cc:2751
      #20 0x00007f7b86023fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
      #21 0x00007f7b858724cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Reproducible on 10.1-10.5.

      Attachments

        Issue Links

          Activity

            The table appears to contain 2 records, as expected. In the primary key index (page 3), we have the delete-marked records
            (pk,DB_TRX_ID,DB_ROLL_PTR,col_int,col_int_key)=(1,0x407,(update),0,2),(2,0x407,(update),0,NULL).
            Both records were apparently delete-marked by transaction 0x407.

            In the secondary key index pk (page 4), we have the records (pk)=(1),(2) and PAGE_MAX_TRX_ID=0x408.
            The problem seems to be that neither record in the secondary index is delete-marked.

            The trace includes multiple calls to row_merge_read_clustered_index(). The function buf_page_create() is invoked for the clustered index page 49:3 at when 297390. There will be only one call to row_merge_read_clustered_index() after that, at when 298612, and the assertion fails at when 299852. Let us define a watchpoint on the delete-mark flag of the first record, to catch when it was written:

            break row_merge_read_clustered_index
            disable 1
            break buf_page_create
            cond 1 page_id.m_space==49 && page_id.m_page_no>2
            continue
            finish
            watch -l $0.frame[0x79]
            enable
            continue
            

            The record will be delete-marked at when 298256 as part of executing DELETE FROM t1.

            At when 298491, we will get another buf_page_create() call to create the secondary index page 49:4. At when 298627, there will be yet another call to create the page 49:5 as part of executing the statement ALTER TABLE t1 ADD INDEX (pk), deep inside in row_merge_insert_index_tuples(), invoked by row_merge_read_clustered_index(), in Thread 32. Because we allocated a second page for the secondary index at some point, there must have been quite a few delete-marked records on the table.

            Let us rewind a little bit to see when the ALTER TABLE t1 ADD INDEX (pk) started executing and when any conflicting transactions were committed:

            run 298256
            break innobase_commit
            continue
            when
            

            At when 298551 we will enter innobase_commit:

            10.4 9eae063e79376fd71586e1106e750a366467a984

            (rr) p trx->lock.table_locks
            $31 = std::vector of length 1, capacity 1 = {0x7f6cdf49ee58}
            (rr) p *((lock_t*)0x7f6cdf49ee58)
            $32 = {trx = 0x7f6cdf49e268, trx_locks = {prev = 0x0, next = 0x7f6cdf49e398}, index = 0x0, hash = 0x0, requested_time = 0, wait_time = 0, un_member = {tab_lock = {table = 0x7f6c582e8900, locks = {prev = 0x0, next = 0x0}}, rec_lock = {space = 1479444736, page_no = 32620, n_bits = 0}}, type_mode = 17}
            (rr) p trx
            $33 = (trx_t *) 0x7f6cdf49e268
            (rr) p ((lock_t*)0x7f6cdf49ee58)->un_member.tab_lock.table.name
            $34 = {m_name = 0x7f6c583267a0 "test/t1", static part_suffix = "#P#"}
            

            This proves that the transaction has modified the InnoDB table t1. We also know that ha_innobase::prepare_inplace_alter_table() should be holding MDL_EXCLUSIVE on the table name. But the ALTER TABLE is executing deeply inside InnoDB:

            10.4 9eae063e79376fd71586e1106e750a366467a984

            (rr) thread 32
            (rr) bt
            …
            #22 0x00005643d9bd8cf9 in prepare_inplace_alter_table_dict (ha_alter_info=0x7f6cdc0787b0, altered_table=0x7f6cdc078840, old_table=0x7f6c583307e0, table_name=0x7f6c5832912d "t1", flags=33, flags2=80, 
                fts_doc_id_col=18446744073709551615, add_fts_doc_id=false, add_fts_doc_id_idx=false) at /home/mariadb/MDEV-22076/10.4/storage/innobase/handler/handler0alter.cc:7024
            #23 0x00005643d9bdd086 in ha_innobase::prepare_inplace_alter_table (this=0x7f6c58331648, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0)
                at /home/mariadb/MDEV-22076/10.4/storage/innobase/handler/handler0alter.cc:8141
            #24 0x00005643d994df12 in handler::ha_prepare_inplace_alter_table (this=0x7f6c58331648, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0) at /home/mariadb/MDEV-22076/10.4/sql/handler.cc:4550
            #25 0x00005643d96cdbe1 in mysql_inplace_alter_table (thd=0x7f6c5c000d50, table_list=0x7f6c5c012298, table=0x7f6c583307e0, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0, 
                inplace_supported=HA_ALTER_INPLACE_NOCOPY_NO_LOCK, target_mdl_request=0x7f6cdc079610, alter_ctx=0x7f6cdc07a140) at /home/mariadb/MDEV-22076/10.4/sql/sql_table.cc:7682
            #26 0x00005643d96d4c74 in mysql_alter_table (thd=0x7f6c5c000d50, new_db=0x7f6c5c005510, new_name=0x7f6c5c005918, create_info=0x7f6cdc07ad20, table_list=0x7f6c5c012298, alter_info=0x7f6cdc07ac60, order_num=0, 
                order=0x0, ignore=false) at /home/mariadb/MDEV-22076/10.4/sql/sql_table.cc:10111
            #27 0x00005643d976cd02 in Sql_cmd_alter_table::execute (this=0x7f6c5c012a28, thd=0x7f6c5c000d50) at /home/mariadb/MDEV-22076/10.4/sql/sql_alter.cc:502
            #28 0x00005643d95eb763 in mysql_execute_command (thd=0x7f6c5c000d50) at /home/mariadb/MDEV-22076/10.4/sql/sql_parse.cc:6101
            #29 0x00005643d95f0e17 in mysql_parse (thd=0x7f6c5c000d50, rawbuf=0x7f6c5c0121b0 "ALTER TABLE t1 ADD INDEX (pk)", length=29, parser_state=0x7f6cdc07c190, is_com_multi=false, is_next_command=false)
                at /home/mariadb/MDEV-22076/10.4/sql/sql_parse.cc:7900
            

            This is a clear proof of what I already suspected: that DDL operations executed in XA transactions will seriously break the metadata locking.

            Someone who is more familiar with the MDL data structures or code should be able to trace down when the MDL on table name test.t1 was broken.

            marko Marko Mäkelä added a comment - The table appears to contain 2 records, as expected. In the primary key index (page 3), we have the delete-marked records (pk,DB_TRX_ID,DB_ROLL_PTR,col_int,col_int_key)=(1,0x407,(update),0,2),(2,0x407,(update),0,NULL). Both records were apparently delete-marked by transaction 0x407. In the secondary key index pk (page 4), we have the records (pk)=(1),(2) and PAGE_MAX_TRX_ID =0x408. The problem seems to be that neither record in the secondary index is delete-marked. The trace includes multiple calls to row_merge_read_clustered_index() . The function buf_page_create() is invoked for the clustered index page 49:3 at when 297390. There will be only one call to row_merge_read_clustered_index() after that, at when 298612, and the assertion fails at when 299852. Let us define a watchpoint on the delete-mark flag of the first record, to catch when it was written: break row_merge_read_clustered_index disable 1 break buf_page_create cond 1 page_id.m_space==49 && page_id.m_page_no>2 continue finish watch -l $0.frame[0x79] enable continue The record will be delete-marked at when 298256 as part of executing DELETE FROM t1 . At when 298491, we will get another buf_page_create() call to create the secondary index page 49:4. At when 298627, there will be yet another call to create the page 49:5 as part of executing the statement ALTER TABLE t1 ADD INDEX (pk) , deep inside in row_merge_insert_index_tuples() , invoked by row_merge_read_clustered_index() , in Thread 32. Because we allocated a second page for the secondary index at some point, there must have been quite a few delete-marked records on the table. Let us rewind a little bit to see when the ALTER TABLE t1 ADD INDEX (pk) started executing and when any conflicting transactions were committed: run 298256 break innobase_commit continue when At when 298551 we will enter innobase_commit : 10.4 9eae063e79376fd71586e1106e750a366467a984 (rr) p trx->lock.table_locks $31 = std::vector of length 1, capacity 1 = {0x7f6cdf49ee58} (rr) p *((lock_t*)0x7f6cdf49ee58) $32 = {trx = 0x7f6cdf49e268, trx_locks = {prev = 0x0, next = 0x7f6cdf49e398}, index = 0x0, hash = 0x0, requested_time = 0, wait_time = 0, un_member = {tab_lock = {table = 0x7f6c582e8900, locks = {prev = 0x0, next = 0x0}}, rec_lock = {space = 1479444736, page_no = 32620, n_bits = 0}}, type_mode = 17} (rr) p trx $33 = (trx_t *) 0x7f6cdf49e268 (rr) p ((lock_t*)0x7f6cdf49ee58)->un_member.tab_lock.table.name $34 = {m_name = 0x7f6c583267a0 "test/t1", static part_suffix = "#P#"} This proves that the transaction has modified the InnoDB table t1 . We also know that ha_innobase::prepare_inplace_alter_table() should be holding MDL_EXCLUSIVE on the table name. But the ALTER TABLE is executing deeply inside InnoDB: 10.4 9eae063e79376fd71586e1106e750a366467a984 (rr) thread 32 (rr) bt … #22 0x00005643d9bd8cf9 in prepare_inplace_alter_table_dict (ha_alter_info=0x7f6cdc0787b0, altered_table=0x7f6cdc078840, old_table=0x7f6c583307e0, table_name=0x7f6c5832912d "t1", flags=33, flags2=80, fts_doc_id_col=18446744073709551615, add_fts_doc_id=false, add_fts_doc_id_idx=false) at /home/mariadb/MDEV-22076/10.4/storage/innobase/handler/handler0alter.cc:7024 #23 0x00005643d9bdd086 in ha_innobase::prepare_inplace_alter_table (this=0x7f6c58331648, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0) at /home/mariadb/MDEV-22076/10.4/storage/innobase/handler/handler0alter.cc:8141 #24 0x00005643d994df12 in handler::ha_prepare_inplace_alter_table (this=0x7f6c58331648, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0) at /home/mariadb/MDEV-22076/10.4/sql/handler.cc:4550 #25 0x00005643d96cdbe1 in mysql_inplace_alter_table (thd=0x7f6c5c000d50, table_list=0x7f6c5c012298, table=0x7f6c583307e0, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0, inplace_supported=HA_ALTER_INPLACE_NOCOPY_NO_LOCK, target_mdl_request=0x7f6cdc079610, alter_ctx=0x7f6cdc07a140) at /home/mariadb/MDEV-22076/10.4/sql/sql_table.cc:7682 #26 0x00005643d96d4c74 in mysql_alter_table (thd=0x7f6c5c000d50, new_db=0x7f6c5c005510, new_name=0x7f6c5c005918, create_info=0x7f6cdc07ad20, table_list=0x7f6c5c012298, alter_info=0x7f6cdc07ac60, order_num=0, order=0x0, ignore=false) at /home/mariadb/MDEV-22076/10.4/sql/sql_table.cc:10111 #27 0x00005643d976cd02 in Sql_cmd_alter_table::execute (this=0x7f6c5c012a28, thd=0x7f6c5c000d50) at /home/mariadb/MDEV-22076/10.4/sql/sql_alter.cc:502 #28 0x00005643d95eb763 in mysql_execute_command (thd=0x7f6c5c000d50) at /home/mariadb/MDEV-22076/10.4/sql/sql_parse.cc:6101 #29 0x00005643d95f0e17 in mysql_parse (thd=0x7f6c5c000d50, rawbuf=0x7f6c5c0121b0 "ALTER TABLE t1 ADD INDEX (pk)", length=29, parser_state=0x7f6cdc07c190, is_com_multi=false, is_next_command=false) at /home/mariadb/MDEV-22076/10.4/sql/sql_parse.cc:7900 This is a clear proof of what I already suspected: that DDL operations executed in XA transactions will seriously break the metadata locking. Someone who is more familiar with the MDL data structures or code should be able to trace down when the MDL on table name test.t1 was broken.

            Looks like regression after df4dd593f, specifically this change:

            @@ -2423,12 +2656,16 @@ mysql_execute_command(THD *thd)
                 {
                   /* Commit the normal transaction if one is active. */
                   if (trans_commit_implicit(thd))
            +      {
            +        thd->mdl_context.release_transactional_locks();
            +        WSREP_DEBUG("implicit commit failed, MDL released: %lu", thd->thread_id);
                     goto error;
            +      }
                   /* Release metadata locks acquired in this transaction. */
                   thd->mdl_context.release_transactional_locks();
                 }
               }
            

            In other words it is releasing metadata locks on implicit commit failure, which ALTER TABLE x attempts early.

            Even if we disregard this change, it feels like there're cases when MDL locks both have to be released and have not to be released on commit failure. And it looks like it was like this since this function was introduced.

            Said the above, commit/rollback routines should be revised carefully for error handling.

            Simplified, yet more deterministic test case (verified with 10.5):

            --source include/have_innodb.inc
             
            CREATE TABLE t1(a INT UNSIGNED PRIMARY KEY) ENGINE=InnoDB;
            INSERT INTO t1 VALUES(1),(2);
             
            XA  START 'xid';
            DELETE FROM t1;
            --error ER_XAER_RMFAIL
            ALTER TABLE x;
            XA END 'xid';
            XA PREPARE 'xid';
             
            connect(con1, localhost, root);
            send ALTER TABLE t1 ADD INDEX(a);
             
            connection default;
            let $n=10000; while ($n) { dec $n; }
            XA COMMIT 'xid';
             
            connection con1;
            reap;
            #INSERT INTO t1 VALUES(1);
            disconnect con1;
             
            connection default;
            DROP TABLE t1;
            

            svoj Sergey Vojtovich added a comment - Looks like regression after df4dd593f, specifically this change: @@ -2423,12 +2656,16 @@ mysql_execute_command(THD *thd) { /* Commit the normal transaction if one is active. */ if (trans_commit_implicit(thd)) + { + thd->mdl_context.release_transactional_locks(); + WSREP_DEBUG("implicit commit failed, MDL released: %lu", thd->thread_id); goto error; + } /* Release metadata locks acquired in this transaction. */ thd->mdl_context.release_transactional_locks(); } } In other words it is releasing metadata locks on implicit commit failure, which ALTER TABLE x attempts early. Even if we disregard this change, it feels like there're cases when MDL locks both have to be released and have not to be released on commit failure. And it looks like it was like this since this function was introduced. Said the above, commit/rollback routines should be revised carefully for error handling. Simplified, yet more deterministic test case (verified with 10.5): --source include/have_innodb.inc   CREATE TABLE t1(a INT UNSIGNED PRIMARY KEY) ENGINE=InnoDB; INSERT INTO t1 VALUES(1),(2);   XA START 'xid'; DELETE FROM t1; --error ER_XAER_RMFAIL ALTER TABLE x; XA END 'xid'; XA PREPARE 'xid';   connect(con1, localhost, root); send ALTER TABLE t1 ADD INDEX(a);   connection default; let $n=10000; while ($n) { dec $n; } XA COMMIT 'xid';   connection con1; reap; #INSERT INTO t1 VALUES(1); disconnect con1;   connection default; DROP TABLE t1;

            The regression that svoj pointed out seems to be introducing something very similar to what I found in MDEV-15532 today.

            marko Marko Mäkelä added a comment - The regression that svoj pointed out seems to be introducing something very similar to what I found in MDEV-15532 today.

            Another curious test case. It only takes one XA ROLLBACK for an XA transaction which doesn't even exist to cause this assertion failure; but I can't find anything non-XA to replace it with.
            The test case fails on all of 10.1-10.5, but it is highly non-deterministic, run with big --repeat. It seems to fail much more readily in shm and with ps-protocol. PS protocol is not needed, it just increases the probability, at least on my machines/builds.
            10.1 requires many more attempts than 10.2+.

            --source include/have_sequence.inc
            --source include/have_innodb.inc
             
            CREATE TABLE t1 (a INT, b INT, PRIMARY KEY(a)) ENGINE=InnoDB;
            INSERT INTO t1 SELECT seq, seq FROM seq_1_to_1000;
             
            START TRANSACTION;
            --send
             UPDATE IGNORE t1 SET a = 0;
             
            --connect (con1,localhost,root,,test)
            --send
              ALTER TABLE t1 ADD KEY (b);
             
            --connection default
            --reap
            --error ER_XAER_NOTA,ER_XAER_OUTSIDE
            XA ROLLBACK 'x';
            SHOW WARNINGS;
            --error ER_NO_SUCH_TABLE
            SELECT * FROM xx;
            COMMIT;
             
            --connection con1
            --reap
            REPLACE INTO t1 SELECT seq, seq FROM seq_1_to_1000;
             
            DROP TABLE t1;
            

            10.2 3a8943ae

            2020-07-25 14:51:25 0x7f83014f7700  InnoDB: Assertion failure in file /data/src/10.2/storage/innobase/row/row0ins.cc line 253
            InnoDB: Failing assertion: !cursor->index->is_committed()
             
            #6  0x0000563fcf0dd2f9 in ut_dbg_assertion_failed (expr=0x563fcf672ce0 "!cursor->index->is_committed()", file=0x563fcf672bf0 "/data/src/10.2/storage/innobase/row/row0ins.cc", line=253) at /data/src/10.2/storage/innobase/ut/ut0dbg.cc:60
            #7  0x0000563fcefeb66c in row_ins_sec_index_entry_by_modify (flags=0, mode=2, cursor=0x7f83014f3b80, offsets=0x7f83014f3b30, offsets_heap=0x7f82b40230f0, heap=0x7f82b4020b00, entry=0x7f82b4021d78, thr=0x7f82b4086000, mtr=0x7f83014f45d0) at /data/src/10.2/storage/innobase/row/row0ins.cc:253
            #8  0x0000563fceff2335 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7f82b4028300, offsets_heap=0x7f82b40230f0, heap=0x7f82b4020b00, entry=0x7f82b4021d78, trx_id=0, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3031
            #9  0x0000563fceff2b19 in row_ins_sec_index_entry (index=0x7f82b4028300, entry=0x7f82b4021d78, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3218
            #10 0x0000563fceff2c76 in row_ins_index_entry (index=0x7f82b4028300, entry=0x7f82b4021d78, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3265
            #11 0x0000563fceff3181 in row_ins_index_entry_step (node=0x7f82b4085da8, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3414
            #12 0x0000563fceff3557 in row_ins (node=0x7f82b4085da8, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3551
            #13 0x0000563fceff3904 in row_ins_step (thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3671
            #14 0x0000563fcf011258 in row_insert_for_mysql (mysql_rec=0x7f82b4019c60 "\375\001", prebuilt=0x7f82b4085870) at /data/src/10.2/storage/innobase/row/row0mysql.cc:1411
            #15 0x0000563fceed2111 in ha_innobase::write_row (this=0x7f82b40181d8, record=0x7f82b4019c60 "\375\001") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:8208
            #16 0x0000563fcecdfaf4 in handler::ha_write_row (this=0x7f82b40181d8, buf=0x7f82b4019c60 "\375\001") at /data/src/10.2/sql/handler.cc:6108
            #17 0x0000563fcea349b3 in write_record (thd=0x7f82b4000af0, table=0x7f82b40ca940, info=0x7f82b4011140) at /data/src/10.2/sql/sql_insert.cc:1667
            #18 0x0000563fcea3abe4 in select_insert::send_data (this=0x7f82b4011100, values=...) at /data/src/10.2/sql/sql_insert.cc:3834
            #19 0x0000563fceacad54 in end_send (join=0x7f82b40111a8, join_tab=0x7f82b4012608, end_of_records=false) at /data/src/10.2/sql/sql_select.cc:20045
            #20 0x0000563fceac87f3 in evaluate_join_record (join=0x7f82b40111a8, join_tab=0x7f82b4012258, error=0) at /data/src/10.2/sql/sql_select.cc:19093
            #21 0x0000563fceac80df in sub_select (join=0x7f82b40111a8, join_tab=0x7f82b4012258, end_of_records=false) at /data/src/10.2/sql/sql_select.cc:18873
            #22 0x0000563fceac7666 in do_select (join=0x7f82b40111a8, procedure=0x0) at /data/src/10.2/sql/sql_select.cc:18417
            #23 0x0000563fceaa1127 in JOIN::exec_inner (this=0x7f82b40111a8) at /data/src/10.2/sql/sql_select.cc:3638
            #24 0x0000563fceaa05e2 in JOIN::exec (this=0x7f82b40111a8) at /data/src/10.2/sql/sql_select.cc:3433
            #25 0x0000563fceaa1798 in mysql_select (thd=0x7f82b4000af0, tables=0x7f82b4045620, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3489925888, result=0x7f82b4011100, unit=0x7f82b40436e8, select_lex=0x7f82b4043e28) at /data/src/10.2/sql/sql_select.cc:3833
            #26 0x0000563fcea95956 in handle_select (thd=0x7f82b4000af0, lex=0x7f82b4043628, result=0x7f82b4011100, setup_tables_done_option=1073741824) at /data/src/10.2/sql/sql_select.cc:361
            #27 0x0000563fcea5a0cc in mysql_execute_command (thd=0x7f82b4000af0) at /data/src/10.2/sql/sql_parse.cc:4275
            #28 0x0000563fcea82ab3 in Prepared_statement::execute (this=0x7f82b40204c0, expanded_query=0x7f83014f6380, open_cursor=false) at /data/src/10.2/sql/sql_prepare.cc:4842
            #29 0x0000563fcea80fed in Prepared_statement::execute_loop (this=0x7f82b40204c0, expanded_query=0x7f83014f6380, open_cursor=false, packet=0x7f82b40088ba "", packet_end=0x7f82b40088ba "") at /data/src/10.2/sql/sql_prepare.cc:4271
            #30 0x0000563fcea7ea60 in mysql_stmt_execute_common (thd=0x7f82b4000af0, stmt_id=7, packet=0x7f82b40088ba "", packet_end=0x7f82b40088ba "", cursor_flags=0, bulk_op=false, read_types=false) at /data/src/10.2/sql/sql_prepare.cc:3273
            #31 0x0000563fcea7e626 in mysqld_stmt_execute (thd=0x7f82b4000af0, packet_arg=0x7f82b40088b1 "\a", packet_length=9) at /data/src/10.2/sql/sql_prepare.cc:3171
            #32 0x0000563fcea53089 in dispatch_command (command=COM_STMT_EXECUTE, thd=0x7f82b4000af0, packet=0x7f82b40088b1 "\a", packet_length=9, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1766
            #33 0x0000563fcea51ea0 in do_command (thd=0x7f82b4000af0) at /data/src/10.2/sql/sql_parse.cc:1377
            #34 0x0000563fceba7f01 in do_handle_one_connection (connect=0x563fd1d829f0) at /data/src/10.2/sql/sql_connect.cc:1336
            #35 0x0000563fceba7c6c in handle_one_connection (arg=0x563fd1d829f0) at /data/src/10.2/sql/sql_connect.cc:1241
            #36 0x0000563fcf3be144 in pfs_spawn_thread (arg=0x563fd1d8d7e0) at /data/src/10.2/storage/perfschema/pfs.cc:1869
            #37 0x00007f8308e794a4 in start_thread (arg=0x7f83014f7700) at pthread_create.c:456
            #38 0x00007f8306fadd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
            

            elenst Elena Stepanova added a comment - Another curious test case. It only takes one XA ROLLBACK for an XA transaction which doesn't even exist to cause this assertion failure; but I can't find anything non-XA to replace it with. The test case fails on all of 10.1-10.5, but it is highly non-deterministic, run with big --repeat . It seems to fail much more readily in shm and with ps-protocol . PS protocol is not needed, it just increases the probability, at least on my machines/builds. 10.1 requires many more attempts than 10.2+. --source include/have_sequence.inc --source include/have_innodb.inc   CREATE TABLE t1 (a INT , b INT , PRIMARY KEY (a)) ENGINE=InnoDB; INSERT INTO t1 SELECT seq, seq FROM seq_1_to_1000;   START TRANSACTION ; --send UPDATE IGNORE t1 SET a = 0;   --connect (con1,localhost,root,,test) --send ALTER TABLE t1 ADD KEY (b);   --connection default --reap --error ER_XAER_NOTA,ER_XAER_OUTSIDE XA ROLLBACK 'x' ; SHOW WARNINGS; --error ER_NO_SUCH_TABLE SELECT * FROM xx; COMMIT ;   --connection con1 --reap REPLACE INTO t1 SELECT seq, seq FROM seq_1_to_1000;   DROP TABLE t1; 10.2 3a8943ae 2020-07-25 14:51:25 0x7f83014f7700 InnoDB: Assertion failure in file /data/src/10.2/storage/innobase/row/row0ins.cc line 253 InnoDB: Failing assertion: !cursor->index->is_committed()   #6 0x0000563fcf0dd2f9 in ut_dbg_assertion_failed (expr=0x563fcf672ce0 "!cursor->index->is_committed()", file=0x563fcf672bf0 "/data/src/10.2/storage/innobase/row/row0ins.cc", line=253) at /data/src/10.2/storage/innobase/ut/ut0dbg.cc:60 #7 0x0000563fcefeb66c in row_ins_sec_index_entry_by_modify (flags=0, mode=2, cursor=0x7f83014f3b80, offsets=0x7f83014f3b30, offsets_heap=0x7f82b40230f0, heap=0x7f82b4020b00, entry=0x7f82b4021d78, thr=0x7f82b4086000, mtr=0x7f83014f45d0) at /data/src/10.2/storage/innobase/row/row0ins.cc:253 #8 0x0000563fceff2335 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7f82b4028300, offsets_heap=0x7f82b40230f0, heap=0x7f82b4020b00, entry=0x7f82b4021d78, trx_id=0, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3031 #9 0x0000563fceff2b19 in row_ins_sec_index_entry (index=0x7f82b4028300, entry=0x7f82b4021d78, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3218 #10 0x0000563fceff2c76 in row_ins_index_entry (index=0x7f82b4028300, entry=0x7f82b4021d78, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3265 #11 0x0000563fceff3181 in row_ins_index_entry_step (node=0x7f82b4085da8, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3414 #12 0x0000563fceff3557 in row_ins (node=0x7f82b4085da8, thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3551 #13 0x0000563fceff3904 in row_ins_step (thr=0x7f82b4086000) at /data/src/10.2/storage/innobase/row/row0ins.cc:3671 #14 0x0000563fcf011258 in row_insert_for_mysql (mysql_rec=0x7f82b4019c60 "\375\001", prebuilt=0x7f82b4085870) at /data/src/10.2/storage/innobase/row/row0mysql.cc:1411 #15 0x0000563fceed2111 in ha_innobase::write_row (this=0x7f82b40181d8, record=0x7f82b4019c60 "\375\001") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:8208 #16 0x0000563fcecdfaf4 in handler::ha_write_row (this=0x7f82b40181d8, buf=0x7f82b4019c60 "\375\001") at /data/src/10.2/sql/handler.cc:6108 #17 0x0000563fcea349b3 in write_record (thd=0x7f82b4000af0, table=0x7f82b40ca940, info=0x7f82b4011140) at /data/src/10.2/sql/sql_insert.cc:1667 #18 0x0000563fcea3abe4 in select_insert::send_data (this=0x7f82b4011100, values=...) at /data/src/10.2/sql/sql_insert.cc:3834 #19 0x0000563fceacad54 in end_send (join=0x7f82b40111a8, join_tab=0x7f82b4012608, end_of_records=false) at /data/src/10.2/sql/sql_select.cc:20045 #20 0x0000563fceac87f3 in evaluate_join_record (join=0x7f82b40111a8, join_tab=0x7f82b4012258, error=0) at /data/src/10.2/sql/sql_select.cc:19093 #21 0x0000563fceac80df in sub_select (join=0x7f82b40111a8, join_tab=0x7f82b4012258, end_of_records=false) at /data/src/10.2/sql/sql_select.cc:18873 #22 0x0000563fceac7666 in do_select (join=0x7f82b40111a8, procedure=0x0) at /data/src/10.2/sql/sql_select.cc:18417 #23 0x0000563fceaa1127 in JOIN::exec_inner (this=0x7f82b40111a8) at /data/src/10.2/sql/sql_select.cc:3638 #24 0x0000563fceaa05e2 in JOIN::exec (this=0x7f82b40111a8) at /data/src/10.2/sql/sql_select.cc:3433 #25 0x0000563fceaa1798 in mysql_select (thd=0x7f82b4000af0, tables=0x7f82b4045620, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3489925888, result=0x7f82b4011100, unit=0x7f82b40436e8, select_lex=0x7f82b4043e28) at /data/src/10.2/sql/sql_select.cc:3833 #26 0x0000563fcea95956 in handle_select (thd=0x7f82b4000af0, lex=0x7f82b4043628, result=0x7f82b4011100, setup_tables_done_option=1073741824) at /data/src/10.2/sql/sql_select.cc:361 #27 0x0000563fcea5a0cc in mysql_execute_command (thd=0x7f82b4000af0) at /data/src/10.2/sql/sql_parse.cc:4275 #28 0x0000563fcea82ab3 in Prepared_statement::execute (this=0x7f82b40204c0, expanded_query=0x7f83014f6380, open_cursor=false) at /data/src/10.2/sql/sql_prepare.cc:4842 #29 0x0000563fcea80fed in Prepared_statement::execute_loop (this=0x7f82b40204c0, expanded_query=0x7f83014f6380, open_cursor=false, packet=0x7f82b40088ba "", packet_end=0x7f82b40088ba "") at /data/src/10.2/sql/sql_prepare.cc:4271 #30 0x0000563fcea7ea60 in mysql_stmt_execute_common (thd=0x7f82b4000af0, stmt_id=7, packet=0x7f82b40088ba "", packet_end=0x7f82b40088ba "", cursor_flags=0, bulk_op=false, read_types=false) at /data/src/10.2/sql/sql_prepare.cc:3273 #31 0x0000563fcea7e626 in mysqld_stmt_execute (thd=0x7f82b4000af0, packet_arg=0x7f82b40088b1 "\a", packet_length=9) at /data/src/10.2/sql/sql_prepare.cc:3171 #32 0x0000563fcea53089 in dispatch_command (command=COM_STMT_EXECUTE, thd=0x7f82b4000af0, packet=0x7f82b40088b1 "\a", packet_length=9, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1766 #33 0x0000563fcea51ea0 in do_command (thd=0x7f82b4000af0) at /data/src/10.2/sql/sql_parse.cc:1377 #34 0x0000563fceba7f01 in do_handle_one_connection (connect=0x563fd1d829f0) at /data/src/10.2/sql/sql_connect.cc:1336 #35 0x0000563fceba7c6c in handle_one_connection (arg=0x563fd1d829f0) at /data/src/10.2/sql/sql_connect.cc:1241 #36 0x0000563fcf3be144 in pfs_spawn_thread (arg=0x563fd1d8d7e0) at /data/src/10.2/storage/perfschema/pfs.cc:1869 #37 0x00007f8308e794a4 in start_thread (arg=0x7f83014f7700) at pthread_create.c:456 #38 0x00007f8306fadd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

            Not able to reproduce the MTR example crash, seems to be fixed with MDEV-14693

            wlad Vladislav Vaintroub added a comment - Not able to reproduce the MTR example crash, seems to be fixed with MDEV-14693

            It was actually fixed with MDEV-15532.
            It is still easily enough reproducible on a previous commit, but not since this one:

            commit 828471cbf83774f4537a78551290b7a4a7f5d374
            Author: Monty
            Date:   Mon Nov 30 15:29:32 2020 +0200
             
                MDEV 15532 Assertion `!log->same_pk' failed in row_log_table_apply_delete
            

            elenst Elena Stepanova added a comment - It was actually fixed with MDEV-15532 . It is still easily enough reproducible on a previous commit, but not since this one: commit 828471cbf83774f4537a78551290b7a4a7f5d374 Author: Monty Date: Mon Nov 30 15:29:32 2020 +0200   MDEV 15532 Assertion `!log->same_pk' failed in row_log_table_apply_delete

            People

              wlad Vladislav Vaintroub
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.