Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
Description
Notes:
- Given the history of false-positives in attempts to reproduce
MDEV-14643/MDEV-9663, I file this one separately. If we are lucky this time and turns out to be the problem that we have been chasing for withinMDEV-14643, feel free to close as a duplicate. - The test case is highly non-deterministic, run with high --repeat=N. It currently fails for me on average within ~100 attempts, but it can vary on different machines and builds. Also, it is not minimal, it can certainly be reduced further, but it seems to decrease the probability of hitting the failure, so I'm keeping it at is. Hopefully it will be analyzable via an rr profile.
- I am not sure whether XA is a guilty part in this scenario, but apparently it allows to create the kind of locking situation that's necessary. Maybe it can be eliminated after the development analysis.
- Please note that !cursor->index->is_committed() only happens on non-debug builds. Debug builds fail with a different assertion. Both are quoted below.
--source include/have_innodb.inc
|
|
CREATE TABLE t1 ( |
pk integer auto_increment, |
col_int int, |
col_int_key int, |
primary key (pk) |
) ENGINE=InnoDB;
|
|
INSERT INTO t1 VALUES (NULL,0,2),(NULL,0, NULL); |
|
CREATE TABLE t2 (t TIMESTAMP NULL) ENGINE=InnoDB; |
XA START 'xid'; |
DELETE FROM t1; |
--error ER_XAER_RMFAIL
|
ALTER TABLE x; |
XA END 'xid'; |
|
--connect (con1,localhost,root,,test)
|
--send
|
ALTER TABLE t1 ADD INDEX (pk); |
|
--connection default
|
--error ER_XAER_RMFAIL
|
UPDATE x SET a = 1; |
--error ER_XAER_RMFAIL
|
UPDATE x SET a = 1 WHERE b = ( SELECT a FROM xx WHERE b = 1 ); |
XA PREPARE 'xid'; |
--error ER_XAER_RMFAIL
|
DELETE FROM x WHERE a IN ( SELECT a FROM xx WHERE b = 1 ); |
--error ER_XAER_RMFAIL
|
REPLACE INTO x (a,b) VALUES (1,1),(1,1); |
XA COMMIT 'xid'; |
|
--connection con1
|
--reap
|
SELECT * FROM t1; |
REPLACE INTO t1 VALUES (1,1,1),(2,3,3); |
|
# Cleanup
|
--disconnect con1
|
--connection default
|
DROP TABLE t1, t2; |
10.4 RelWithDebInfo ba679ae52f |
2020-03-29 21:24:18 1 [ERROR] InnoDB: tried to purge non-delete-marked record in index `pk` of table `test`.`t1`: tuple: TUPLE (info_bits=0, 1 fields): {[4] (0x80000002)}, record: COMPACT RECORD(info_bits=0, 1 fields): {[4] (0x80000002)}
|
2020-03-29 21:24:18 4 [ERROR] InnoDB: tried to purge non-delete-marked record in index `pk` of table `test`.`t1`: tuple: TUPLE (info_bits=0, 1 fields): {[4] (0x80000001)}, record: COMPACT RECORD(info_bits=0, 1 fields): {[4] (0x80000001)}
|
2020-03-29 21:24:18 0x7f4b3e254700 InnoDB: Assertion failure in file /data/src/10.4/storage/innobase/row/row0ins.cc line 263
|
InnoDB: Failing assertion: !cursor->index->is_committed()
|
|
#6 0x00005603908facb7 in ut_dbg_assertion_failed (expr=expr@entry=0x56039127a188 "!cursor->index->is_committed()", file=file@entry=0x560391279c30 "/data/src/10.4/storage/innobase/row/row0ins.cc", line=line@entry=263) at /data/src/10.4/storage/innobase/ut/ut0dbg.cc:60
|
#7 0x00005603908ea580 in row_ins_sec_index_entry_by_modify (mtr=0x7f4b3e24f9e0, thr=0x7f4af401f558, entry=<optimized out>, heap=0x7f4af401c7d0, offsets_heap=<optimized out>, offsets=0x7f4b3e24eac8, cursor=0x7f4b3e24ebd0, mode=2, flags=0) at /data/src/10.4/storage/innobase/row/row0ins.cc:263
|
#8 row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7f4af401b9e8, offsets_heap=<optimized out>, heap=0x7f4af401c7d0, entry=<optimized out>, trx_id=<optimized out>, thr=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3099
|
#9 0x0000560390df47c8 in row_ins_sec_index_entry (index=0x7f4af401b9e8, entry=0x7f4af012c2a0, thr=0x7f4af401f558, check_foreign=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3314
|
#10 0x0000560390df4e2a in row_ins_index_entry (thr=0x7f4af401f558, entry=<optimized out>, index=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3361
|
#11 row_ins_index_entry_step (thr=0x7f4af401f558, node=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3528
|
#12 row_ins (thr=0x7f4af401f558, node=<optimized out>) at /data/src/10.4/storage/innobase/row/row0ins.cc:3664
|
#13 row_ins_step (thr=thr@entry=0x7f4af401f558) at /data/src/10.4/storage/innobase/row/row0ins.cc:3807
|
#14 0x0000560390e03718 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x7f4af012d3a8 "\371\001", prebuilt=<optimized out>, ins_mode=<optimized out>) at /data/src/10.4/storage/innobase/row/row0mysql.cc:1463
|
#15 0x0000560390d5f42d in ha_innobase::write_row (this=0x7f4af401e340, record=0x7f4af012d3a8 "\371\001") at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:8035
|
#16 0x0000560390be50e5 in handler::ha_write_row (this=0x7f4af401e340, buf=0x7f4af012d3a8 "\371\001") at /data/src/10.4/sql/handler.cc:6677
|
#17 0x00005603909ba44d in write_record (thd=thd@entry=0x7f4af4000c08, table=table@entry=0x7f4af401d548, info=info@entry=0x7f4b3e2508b0) at /data/src/10.4/sql/sql_insert.cc:1732
|
#18 0x00005603909c0896 in mysql_insert (thd=thd@entry=0x7f4af4000c08, table_list=<optimized out>, fields=..., values_list=..., update_fields=..., update_values=..., duplic=<optimized out>, ignore=<optimized out>) at /data/src/10.4/sql/sql_insert.cc:1078
|
#19 0x00005603909ec35a in mysql_execute_command (thd=0x7f4af4000c08) at /data/src/10.4/sql/sql_parse.cc:4528
|
#20 0x00005603909f2fd8 in mysql_parse (thd=thd@entry=0x7f4af4000c08, rawbuf=<optimized out>, length=38, parser_state=parser_state@entry=0x7f4b3e253190, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.4/sql/sql_parse.cc:7900
|
#21 0x00005603909f52af in dispatch_command (command=COM_QUERY, thd=0x7f4af4000c08, packet=<optimized out>, packet_length=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /data/src/10.4/sql/sql_class.h:1168
|
#22 0x00005603909f69df in do_command (thd=0x7f4af4000c08) at /data/src/10.4/sql/sql_parse.cc:1359
|
#23 0x0000560390aca8f4 in do_handle_one_connection (connect=connect@entry=0x560392bdba18) at /data/src/10.4/sql/sql_connect.cc:1412
|
#24 0x0000560390aca9e4 in handle_one_connection (arg=arg@entry=0x560392bdba18) at /data/src/10.4/sql/sql_connect.cc:1316
|
#25 0x00005603910242bf in pfs_spawn_thread (arg=0x560392b64228) at /data/src/10.4/storage/perfschema/pfs.cc:1869
|
#26 0x00007f4b4998bfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
|
#27 0x00007f4b491da4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
10.4 debug 9eae063e |
2020-03-29 21:25:45 1 [ERROR] InnoDB: tried to purge non-delete-marked record in index `pk` of table `test`.`t1`: tuple: TUPLE (info_bits=0, 1 fields): {[4] (0x80000002)}, record: COMPACT RECORD(info_bits=0, 1 fields): {[4] (0x80000002)}
|
mysqld: /data/src/10.4/storage/innobase/row/row0purge.cc:600: bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*): Assertion `0' failed.
|
200329 21:25:45 [ERROR] mysqld got signal 6 ;
|
|
#7 0x00007f7b857a9102 in __GI___assert_fail (assertion=0x559b1abf11aa "0", file=0x559b1abf1ac0 "/data/src/10.4/storage/innobase/row/row0purge.cc", line=600, function=0x559b1abf2da0 <row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, dtuple_t const*)::__PRETTY_FUNCTION__> "bool row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, const dtuple_t*)") at assert.c:101
|
#8 0x0000559b1a3d0fd6 in row_purge_remove_sec_if_poss_leaf (node=0x559b1dbda670, index=0x7f7b2c03a550, entry=0x7f7b58016280) at /data/src/10.4/storage/innobase/row/row0purge.cc:600
|
#9 0x0000559b1a3d1352 in row_purge_remove_sec_if_poss (node=0x559b1dbda670, index=0x7f7b2c03a550, entry=0x7f7b58016280) at /data/src/10.4/storage/innobase/row/row0purge.cc:696
|
#10 0x0000559b1a3d1577 in row_purge_del_mark (node=0x559b1dbda670) at /data/src/10.4/storage/innobase/row/row0purge.cc:770
|
#11 0x0000559b1a3d318c in row_purge_record_func (node=0x559b1dbda670, undo_rec=0x559b1dbdb120 "", thr=0x559b1dbda5b0, updated_extern=false) at /data/src/10.4/storage/innobase/row/row0purge.cc:1196
|
#12 0x0000559b1a3d349a in row_purge (node=0x559b1dbda670, undo_rec=0x559b1dbdb120 "", thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/row/row0purge.cc:1263
|
#13 0x0000559b1a3d3845 in row_purge_step (thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/row/row0purge.cc:1342
|
#14 0x0000559b1a33f73a in que_thr_step (thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/que/que0que.cc:1037
|
#15 0x0000559b1a33f9d1 in que_run_threads_low (thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/que/que0que.cc:1099
|
#16 0x0000559b1a33fc1f in que_run_threads (thr=0x559b1dbda5b0) at /data/src/10.4/storage/innobase/que/que0que.cc:1139
|
#17 0x0000559b1a44ab57 in trx_purge (n_purge_threads=4, truncate=false, slot=0x559b1b22bdd0 <srv_sys+592>) at /data/src/10.4/storage/innobase/trx/trx0purge.cc:1322
|
#18 0x0000559b1a41c64c in srv_do_purge (n_total_purged=0x7f7b62ffceb8, slot=0x559b1b22bdd0 <srv_sys+592>) at /data/src/10.4/storage/innobase/srv/srv0srv.cc:2612
|
#19 0x0000559b1a41cb95 in srv_purge_coordinator_thread (arg=0x0) at /data/src/10.4/storage/innobase/srv/srv0srv.cc:2751
|
#20 0x00007f7b86023fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
|
#21 0x00007f7b858724cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
Reproducible on 10.1-10.5.
Attachments
Issue Links
- relates to
-
MDEV-14643 InnoDB: Failing assertion: !cursor->index->is_committed()
-
- Closed
-
-
MDEV-15532 XA: Assertion `!log->same_pk' failed in row_log_table_apply_delete
-
- Closed
-
-
MDEV-19338 InnoDB: Failing assertion: !cursor->index->is_committed()
-
- Closed
-
-
MDEV-22061 InnoDB: Assertion of missing row in sec index row_start upon REPLACE on a system-versioned table
-
- Closed
-
-
MDEV-22759 Failing assertion: !cursor->index->is_committed() upon update on table with HASH index
-
- Closed
-
-
MDEV-22924 Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
-
- Closed
-
The table appears to contain 2 records, as expected. In the primary key index (page 3), we have the delete-marked records
(pk,DB_TRX_ID,DB_ROLL_PTR,col_int,col_int_key)=(1,0x407,(update),0,2),(2,0x407,(update),0,NULL).
Both records were apparently delete-marked by transaction 0x407.
In the secondary key index pk (page 4), we have the records (pk)=(1),(2) and PAGE_MAX_TRX_ID=0x408.
The problem seems to be that neither record in the secondary index is delete-marked.
The trace includes multiple calls to row_merge_read_clustered_index(). The function buf_page_create() is invoked for the clustered index page 49:3 at when 297390. There will be only one call to row_merge_read_clustered_index() after that, at when 298612, and the assertion fails at when 299852. Let us define a watchpoint on the delete-mark flag of the first record, to catch when it was written:
break row_merge_read_clustered_index
disable 1
break buf_page_create
cond 1 page_id.m_space==49 && page_id.m_page_no>2
continue
finish
watch -l $0.frame[0x79]
enable
continue
The record will be delete-marked at when 298256 as part of executing DELETE FROM t1.
At when 298491, we will get another buf_page_create() call to create the secondary index page 49:4. At when 298627, there will be yet another call to create the page 49:5 as part of executing the statement ALTER TABLE t1 ADD INDEX (pk), deep inside in row_merge_insert_index_tuples(), invoked by row_merge_read_clustered_index(), in Thread 32. Because we allocated a second page for the secondary index at some point, there must have been quite a few delete-marked records on the table.
Let us rewind a little bit to see when the ALTER TABLE t1 ADD INDEX (pk) started executing and when any conflicting transactions were committed:
run 298256
break innobase_commit
continue
when
At when 298551 we will enter innobase_commit:
10.4 9eae063e79376fd71586e1106e750a366467a984
(rr) p trx->lock.table_locks
$31 = std::vector of length 1, capacity 1 = {0x7f6cdf49ee58}
(rr) p *((lock_t*)0x7f6cdf49ee58)
$32 = {trx = 0x7f6cdf49e268, trx_locks = {prev = 0x0, next = 0x7f6cdf49e398}, index = 0x0, hash = 0x0, requested_time = 0, wait_time = 0, un_member = {tab_lock = {table = 0x7f6c582e8900, locks = {prev = 0x0, next = 0x0}}, rec_lock = {space = 1479444736, page_no = 32620, n_bits = 0}}, type_mode = 17}
(rr) p trx
$33 = (trx_t *) 0x7f6cdf49e268
(rr) p ((lock_t*)0x7f6cdf49ee58)->un_member.tab_lock.table.name
$34 = {m_name = 0x7f6c583267a0 "test/t1", static part_suffix = "#P#"}
This proves that the transaction has modified the InnoDB table t1. We also know that ha_innobase::prepare_inplace_alter_table() should be holding MDL_EXCLUSIVE on the table name. But the ALTER TABLE is executing deeply inside InnoDB:
10.4 9eae063e79376fd71586e1106e750a366467a984
(rr) thread 32
(rr) bt
…
#22 0x00005643d9bd8cf9 in prepare_inplace_alter_table_dict (ha_alter_info=0x7f6cdc0787b0, altered_table=0x7f6cdc078840, old_table=0x7f6c583307e0, table_name=0x7f6c5832912d "t1", flags=33, flags2=80,
fts_doc_id_col=18446744073709551615, add_fts_doc_id=false, add_fts_doc_id_idx=false) at /home/mariadb/MDEV-22076/10.4/storage/innobase/handler/handler0alter.cc:7024
#23 0x00005643d9bdd086 in ha_innobase::prepare_inplace_alter_table (this=0x7f6c58331648, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0)
at /home/mariadb/MDEV-22076/10.4/storage/innobase/handler/handler0alter.cc:8141
#24 0x00005643d994df12 in handler::ha_prepare_inplace_alter_table (this=0x7f6c58331648, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0) at /home/mariadb/MDEV-22076/10.4/sql/handler.cc:4550
#25 0x00005643d96cdbe1 in mysql_inplace_alter_table (thd=0x7f6c5c000d50, table_list=0x7f6c5c012298, table=0x7f6c583307e0, altered_table=0x7f6cdc078840, ha_alter_info=0x7f6cdc0787b0,
inplace_supported=HA_ALTER_INPLACE_NOCOPY_NO_LOCK, target_mdl_request=0x7f6cdc079610, alter_ctx=0x7f6cdc07a140) at /home/mariadb/MDEV-22076/10.4/sql/sql_table.cc:7682
#26 0x00005643d96d4c74 in mysql_alter_table (thd=0x7f6c5c000d50, new_db=0x7f6c5c005510, new_name=0x7f6c5c005918, create_info=0x7f6cdc07ad20, table_list=0x7f6c5c012298, alter_info=0x7f6cdc07ac60, order_num=0,
order=0x0, ignore=false) at /home/mariadb/MDEV-22076/10.4/sql/sql_table.cc:10111
#27 0x00005643d976cd02 in Sql_cmd_alter_table::execute (this=0x7f6c5c012a28, thd=0x7f6c5c000d50) at /home/mariadb/MDEV-22076/10.4/sql/sql_alter.cc:502
#28 0x00005643d95eb763 in mysql_execute_command (thd=0x7f6c5c000d50) at /home/mariadb/MDEV-22076/10.4/sql/sql_parse.cc:6101
#29 0x00005643d95f0e17 in mysql_parse (thd=0x7f6c5c000d50, rawbuf=0x7f6c5c0121b0 "ALTER TABLE t1 ADD INDEX (pk)", length=29, parser_state=0x7f6cdc07c190, is_com_multi=false, is_next_command=false)
at /home/mariadb/MDEV-22076/10.4/sql/sql_parse.cc:7900
This is a clear proof of what I already suspected: that DDL operations executed in XA transactions will seriously break the metadata locking.
Someone who is more familiar with the MDL data structures or code should be able to trace down when the MDL on table name test.t1 was broken.