Query (0x7eff34013618): ALTER TABLE `AA` ADD COLUMN c INTEGER DEFAULT NULL /* QNO 2942 CON_ID 16 */
Connection ID (thread ID): 16
Status: KILL_TIMEOUT
#7 0x00007fe3fc442ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x0000560a3f71a73b in row_log_table_apply_update (thr=0x7fe388685620, new_trx_id_col=1, mrec=0x7fe3e00c7202 "\200", offsets=0x7fe388055920, offsets_heap=0x7fe3884b9940, heap=0x7fe3886dd470, dup=0x7fe3e0ef02e0, trx_id=6660, old_pk=0x7fe3886dd4f8) at /data/src/10.2/storage/innobase/row/row0log.cc:1985
#9 0x0000560a3f71c061 in row_log_table_apply_op (thr=0x7fe388685620, trx_id_col=1, new_trx_id_col=1, dup=0x7fe3e0ef02e0, error=0x7fe3e0ef0084, offsets_heap=0x7fe3884b9940, heap=0x7fe3886dd470, mrec=0x7fe3e00c7202 "\200", mrec_end=0x7fe3e00d3b9c "", offsets=0x7fe388055920) at /data/src/10.2/storage/innobase/row/row0log.cc:2540
#10 0x0000560a3f71d4bf in row_log_table_apply_ops (thr=0x7fe388685620, dup=0x7fe3e0ef02e0, stage=0x7fe388565430) at /data/src/10.2/storage/innobase/row/row0log.cc:2909
#11 0x0000560a3f71d9ca in row_log_table_apply (thr=0x7fe388685620, old_table=0x7fe374189e88, table=0x7fe38805f050, stage=0x7fe388565430) at /data/src/10.2/storage/innobase/row/row0log.cc:3008
#12 0x0000560a3f6048ae in ha_innobase::inplace_alter_table (this=0x7fe3741b1fa8, altered_table=0x7fe38805f050, ha_alter_info=0x7fe3e0ef0610) at /data/src/10.2/storage/innobase/handler/handler0alter.cc:6488
#13 0x0000560a3f11a8f7 in handler::ha_inplace_alter_table (this=0x7fe3741b1fa8, altered_table=0x7fe38805f050, ha_alter_info=0x7fe3e0ef0610) at /data/src/10.2/sql/handler.h:3775
#14 0x0000560a3f1117b1 in mysql_inplace_alter_table (thd=0x7fe388000ba0, table_list=0x7fe388013768, table=0x7fe3741b13a0, altered_table=0x7fe38805f050, ha_alter_info=0x7fe3e0ef0610, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7fe3e0ef0680, alter_ctx=0x7fe3e0ef1240) at /data/src/10.2/sql/sql_table.cc:7332
#15 0x0000560a3f116de4 in mysql_alter_table (thd=0x7fe388000ba0, new_db=0x7fe388013d78 "test", new_name=0x0, create_info=0x7fe3e0ef1e50, table_list=0x7fe388013768, alter_info=0x7fe3e0ef1da0, order_num=0, order=0x0, ignore=false) at /data/src/10.2/sql/sql_table.cc:9353
#16 0x0000560a3f190954 in Sql_cmd_alter_table::execute (this=0x7fe388013f20, thd=0x7fe388000ba0) at /data/src/10.2/sql/sql_alter.cc:324
#17 0x0000560a3f04bc57 in mysql_execute_command (thd=0x7fe388000ba0) at /data/src/10.2/sql/sql_parse.cc:6208
#18 0x0000560a3f0505d1 in mysql_parse (thd=0x7fe388000ba0, rawbuf=0x7fe388013618 "ALTER TABLE `E` ADD COLUMN m INTEGER DEFAULT NULL /* QNO 3376 CON_ID 16 */", length=74, parser_state=0x7fe3e0ef3200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7902
#19 0x0000560a3f03e4f1 in dispatch_command (command=COM_QUERY, thd=0x7fe388000ba0, packet=0x7fe38807ed41 "ALTER TABLE `E` ADD COLUMN m INTEGER DEFAULT NULL /* QNO 3376 CON_ID 16 */ ", packet_length=75, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1806
#20 0x0000560a3f03ce54 in do_command (thd=0x7fe388000ba0) at /data/src/10.2/sql/sql_parse.cc:1360
#21 0x0000560a3f18b646 in do_handle_one_connection (connect=0x560a41e93610) at /data/src/10.2/sql/sql_connect.cc:1335
#22 0x0000560a3f18b3d3 in handle_one_connection (arg=0x560a41e93610) at /data/src/10.2/sql/sql_connect.cc:1241
#23 0x0000560a3f5ab1f8 in pfs_spawn_thread (arg=0x560a41f35e50) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#24 0x00007fe3fe119494 in start_thread (arg=0x7fe3e0ef4700) at pthread_create.c:333
#25 0x00007fe3fc4ff93f in clone () from /lib/x86_64-linux-gnu/libc.so.6
No reliable information about other version. Most likely isn't worth digging, because root cause is likely to be the same as for the rest of MDEV-14693 family.
Problem (same assert + also an ALTER TABLE t1 ADD COLUMN) replayed on
10.4.0-MariaDB-debug-log
HEAD -> 10.4, origin/bb-10.4-bar, origin/10.4 5646c4315918c1e4127bfb68b70d1c5333bfc2c7 2018-10-09T12:02:35+04:00
too.
/* The record was not found. This should only happen
when an earlier ROW_T_INSERT or ROW_T_UPDATE was
diverted because BLOBs were freed when the insert was
later rolled back. */
ut_ad(log->blobs);
I filed MDEV-23244 for the failure that you reproduced. It is specific to 10.3 and later.
Marko Mäkelä
added a comment - mleich , the originally reported assertion failure for 10.2 is something else, which could also be possible in 10.0 and 10.1:
if (page_rec_is_infimum(btr_pcur_get_rec(&pcur))
|| btr_pcur_get_low_match(&pcur) < index->n_uniq) {
/* The record was not found. This should only happen
when an earlier ROW_T_INSERT or ROW_T_UPDATE was
diverted because BLOBs were freed when the insert was
later rolled back. */
ut_ad( log ->blobs);
I filed MDEV-23244 for the failure that you reproduced. It is specific to 10.3 and later.
The initial failure was most certainly XA-specific. It was one of a bunch of problems revealed by enabling (usually disabled) XA flow in concurrent tests. MDEV-15518, MDEV-15490, MDEV-15532 and probably some more were a part of the same exercise. Some of them are now closed as "not reproducible / probably fixed in 10.2", I doubt it is accurate, not much was happening in 10.2 in this regard. But anyway, around the same time when this item, MDEV-15533, was filed, Marko's analysis in MDEV-14693 confirmed that the failures are a side-effect of the broken XA, which is why this issue was never actively pursued.
It is not very difficult to create a non-deterministic XA-based test case for it (and generate an rr profile), it's just rather pointless since there is little to no chance anything can be done about it.
Still, here is, for example, a test case which fails (non-deterministically) on current 10.1 (run with --repeat and on disk rather than in shm):
--source include/have_innodb.inc
--source include/have_sequence.inc
CREATETABLE t1 (a INT, b INT) ENGINE=InnoDB;
CREATEVIEW v1 ASSELECT * FROM t1;
INSERTINTO t1 (a) SELECT seq FROM seq_1_to_1000;
--connect (con1,localhost,root,,test)
XA START 'x';
INSERTINTO t1 (a) VALUES (0);
--error ER_XAER_RMFAIL
FLUSH HOSTS;
--connection default
SET lock_wait_timeout= 1;
--send
ALTERTABLE t1 DROP b;
--connection con1
--error ER_WRONG_VALUE_FOR_VAR
SET sql_mode= NULL;
UPDATE v1 SET a = 0;
--connection default
--error ER_LOCK_WAIT_TIMEOUT
--reap
--disconnect con1
DROPVIEW v1;
DROPTABLE t1;
An interesting part about this test case is that while it fails on 10.1 on the assertion and passes on 10.2-10.3 on whatever reason, it causes a strange effect on 10.4-10.5: there UPDATE stalls for innodb_lock_wait_timeout seconds (e.g. for 50 seconds if you run the test case with default settings). After 50 seconds, it succeeds and proceeds. I don't see right away from the test case logic why it would do so.
Elena Stepanova
added a comment - - edited The initial failure was most certainly XA-specific. It was one of a bunch of problems revealed by enabling (usually disabled) XA flow in concurrent tests. MDEV-15518 , MDEV-15490 , MDEV-15532 and probably some more were a part of the same exercise. Some of them are now closed as "not reproducible / probably fixed in 10.2", I doubt it is accurate, not much was happening in 10.2 in this regard. But anyway, around the same time when this item, MDEV-15533 , was filed, Marko's analysis in MDEV-14693 confirmed that the failures are a side-effect of the broken XA, which is why this issue was never actively pursued.
It is not very difficult to create a non-deterministic XA-based test case for it (and generate an rr profile), it's just rather pointless since there is little to no chance anything can be done about it.
Still, here is, for example, a test case which fails (non-deterministically) on current 10.1 (run with --repeat and on disk rather than in shm):
--source include/have_innodb.inc
--source include/have_sequence.inc
CREATE TABLE t1 (a INT , b INT ) ENGINE=InnoDB;
CREATE VIEW v1 AS SELECT * FROM t1;
INSERT INTO t1 (a) SELECT seq FROM seq_1_to_1000;
--connect (con1,localhost,root,,test)
XA START 'x' ;
INSERT INTO t1 (a) VALUES (0);
--error ER_XAER_RMFAIL
FLUSH HOSTS;
--connection default
SET lock_wait_timeout= 1;
--send
ALTER TABLE t1 DROP b;
--connection con1
--error ER_WRONG_VALUE_FOR_VAR
SET sql_mode= NULL ;
UPDATE v1 SET a = 0;
--connection default
--error ER_LOCK_WAIT_TIMEOUT
--reap
--disconnect con1
DROP VIEW v1;
DROP TABLE t1;
An interesting part about this test case is that while it fails on 10.1 on the assertion and passes on 10.2-10.3 on whatever reason, it causes a strange effect on 10.4-10.5: there UPDATE stalls for innodb_lock_wait_timeout seconds (e.g. for 50 seconds if you run the test case with default settings). After 50 seconds, it succeeds and proceeds. I don't see right away from the test case logic why it would do so.
And here is a test case which seems to be failing equally well on all of current 10.1-10.5 (but it's still non-deterministic, so run with --repeat=N if it doesn't fail right away):
#8 0x000055c76501de78 in row_log_table_apply_update (thr=0x7fa07401d658, new_trx_id_col=1, mrec=0x7fa0a8080003 "", offsets=0x7fa07400ca38, offsets_heap=0x7fa074078608, heap=0x7fa074074568, dup=0x7fa0a8c9f490, old_pk=0x7fa0740745e8) at /data/src/10.5/storage/innobase/row/row0log.cc:2127
#9 0x000055c76501fff3 in row_log_table_apply_op (thr=0x7fa07401d658, new_trx_id_col=1, dup=0x7fa0a8c9f490, error=0x7fa0a8c9f20c, offsets_heap=0x7fa074078608, heap=0x7fa074074568, mrec=0x7fa0a8080003 "", mrec_end=0x7fa0a8080094 "", offsets=0x7fa07400ca38) at /data/src/10.5/storage/innobase/row/row0log.cc:2670
#10 0x000055c765022233 in row_log_table_apply_ops (thr=0x7fa07401d658, dup=0x7fa0a8c9f490, stage=0x7fa074020818) at /data/src/10.5/storage/innobase/row/row0log.cc:3043
#11 0x000055c765022988 in row_log_table_apply (thr=0x7fa07401d658, old_table=0x7fa07021b7d8, table=0x7fa0a8ca0910, stage=0x7fa074020818, new_table=0x7fa07401df18) at /data/src/10.5/storage/innobase/row/row0log.cc:3149
#12 0x000055c764ea60c6 in alter_rebuild_apply_log (ctx=0x7fa0740137f0, ha_alter_info=0x7fa0a8ca0870, altered_table=0x7fa0a8ca0910) at /data/src/10.5/storage/innobase/handler/handler0alter.cc:10633
#13 0x000055c764ea6fc1 in ha_innobase::commit_inplace_alter_table (this=0x7fa0700faa00, altered_table=0x7fa0a8ca0910, ha_alter_info=0x7fa0a8ca0870, commit=true) at /data/src/10.5/storage/innobase/handler/handler0alter.cc:10832
#14 0x000055c7649fadf9 in handler::ha_commit_inplace_alter_table (this=0x7fa0700faa00, altered_table=0x7fa0a8ca0910, ha_alter_info=0x7fa0a8ca0870, commit=true) at /data/src/10.5/sql/handler.cc:4843
#15 0x000055c7647918ae in mysql_inplace_alter_table (thd=0x7fa074000b18, table_list=0x7fa074012548, table=0x7fa07021b248, altered_table=0x7fa0a8ca0910, ha_alter_info=0x7fa0a8ca0870, inplace_supported=HA_ALTER_INPLACE_COPY_NO_LOCK, target_mdl_request=0x7fa0a8ca0ce0, alter_ctx=0x7fa0a8ca1830) at /data/src/10.5/sql/sql_table.cc:8038
#16 0x000055c7647990d9 in mysql_alter_table (thd=0x7fa074000b18, new_db=0x7fa0740053d8, new_name=0x7fa0740057e0, create_info=0x7fa0a8ca2430, table_list=0x7fa074012548, alter_info=0x7fa0a8ca2360, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/src/10.5/sql/sql_table.cc:10564
#17 0x000055c76483e97a in Sql_cmd_alter_table::execute (this=0x7fa074012c20, thd=0x7fa074000b18) at /data/src/10.5/sql/sql_alter.cc:538
#18 0x000055c76469da0c in mysql_execute_command (thd=0x7fa074000b18) at /data/src/10.5/sql/sql_parse.cc:5951
#19 0x000055c7646a3d4d in mysql_parse (thd=0x7fa074000b18, rawbuf=0x7fa074012470 "ALTER TABLE t1 FORCE", length=20, parser_state=0x7fa0a8ca3520, is_com_multi=false, is_next_command=false) at /data/src/10.5/sql/sql_parse.cc:7993
#20 0x000055c764690046 in dispatch_command (command=COM_QUERY, thd=0x7fa074000b18, packet=0x7fa0740088b9 "ALTER TABLE t1 FORCE", packet_length=20, is_com_multi=false, is_next_command=false) at /data/src/10.5/sql/sql_parse.cc:1867
#21 0x000055c76468e77e in do_command (thd=0x7fa074000b18) at /data/src/10.5/sql/sql_parse.cc:1348
#22 0x000055c76483407a in do_handle_one_connection (connect=0x55c7684c5278, put_in_cache=true) at /data/src/10.5/sql/sql_connect.cc:1410
#23 0x000055c764833de2 in handle_one_connection (arg=0x55c7684c5278) at /data/src/10.5/sql/sql_connect.cc:1312
#24 0x000055c764d72126 in pfs_spawn_thread (arg=0x55c7684c4ea8) at /data/src/10.5/storage/perfschema/pfs.cc:2201
#25 0x00007fa0b16f24a4 in start_thread (arg=0x7fa0a8ca4700) at pthread_create.c:456
#26 0x00007fa0af826d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
Elena Stepanova
added a comment - And here is a test case which seems to be failing equally well on all of current 10.1-10.5 (but it's still non-deterministic, so run with --repeat=N if it doesn't fail right away):
--source include/have_innodb.inc
CREATE TABLE t1 (a INT ) ENGINE=InnoDB;
XA BEGIN 'x' ;
INSERT IGNORE INTO t1 VALUES (1),(2);
--connect (con1,localhost,root,,test)
SET lock_wait_timeout= 1;
--send
ALTER TABLE t1 FORCE ;
--connection default
--error ER_NO_SUCH_TABLE
UPDATE IGNORE xx SET x = 0;
--error ER_XAER_RMFAIL
FLUSH TABLES;
UPDATE t1 SET a = 3;
XA END 'x' ;
XA ROLLBACK 'x' ;
--connection con1
--reap
# Cleanup
--disconnect con1
--connection default
DROP TABLE t1;
10.5 74491955
mariadbd: /data/src/10.5/storage/innobase/row/row0log.cc:2127: dberr_t row_log_table_apply_update(que_thr_t*, ulint, const mrec_t*, const rec_offs*, mem_heap_t*, mem_heap_t*, row_merge_dup_t*, const dtuple_t*): Assertion `log->blobs' failed.
200725 16:08:18 [ERROR] mysqld got signal 6 ;
#7 0x00007fa0af769f12 in __GI___assert_fail (assertion=0x55c7657c992a "log->blobs", file=0x55c7657c8cf8 "/data/src/10.5/storage/innobase/row/row0log.cc", line=2127, function=0x55c7657ccca0 <row_log_table_apply_update(que_thr_t*, unsigned long, unsigned char const*, unsigned short const*, mem_block_info_t*, mem_block_info_t*, row_merge_dup_t*, dtuple_t const*)::__PRETTY_FUNCTION__> "dberr_t row_log_table_apply_update(que_thr_t*, ulint, const mrec_t*, const rec_offs*, mem_heap_t*, mem_heap_t*, row_merge_dup_t*, const dtuple_t*)") at assert.c:101
#8 0x000055c76501de78 in row_log_table_apply_update (thr=0x7fa07401d658, new_trx_id_col=1, mrec=0x7fa0a8080003 "", offsets=0x7fa07400ca38, offsets_heap=0x7fa074078608, heap=0x7fa074074568, dup=0x7fa0a8c9f490, old_pk=0x7fa0740745e8) at /data/src/10.5/storage/innobase/row/row0log.cc:2127
#9 0x000055c76501fff3 in row_log_table_apply_op (thr=0x7fa07401d658, new_trx_id_col=1, dup=0x7fa0a8c9f490, error=0x7fa0a8c9f20c, offsets_heap=0x7fa074078608, heap=0x7fa074074568, mrec=0x7fa0a8080003 "", mrec_end=0x7fa0a8080094 "", offsets=0x7fa07400ca38) at /data/src/10.5/storage/innobase/row/row0log.cc:2670
#10 0x000055c765022233 in row_log_table_apply_ops (thr=0x7fa07401d658, dup=0x7fa0a8c9f490, stage=0x7fa074020818) at /data/src/10.5/storage/innobase/row/row0log.cc:3043
#11 0x000055c765022988 in row_log_table_apply (thr=0x7fa07401d658, old_table=0x7fa07021b7d8, table=0x7fa0a8ca0910, stage=0x7fa074020818, new_table=0x7fa07401df18) at /data/src/10.5/storage/innobase/row/row0log.cc:3149
#12 0x000055c764ea60c6 in alter_rebuild_apply_log (ctx=0x7fa0740137f0, ha_alter_info=0x7fa0a8ca0870, altered_table=0x7fa0a8ca0910) at /data/src/10.5/storage/innobase/handler/handler0alter.cc:10633
#13 0x000055c764ea6fc1 in ha_innobase::commit_inplace_alter_table (this=0x7fa0700faa00, altered_table=0x7fa0a8ca0910, ha_alter_info=0x7fa0a8ca0870, commit=true) at /data/src/10.5/storage/innobase/handler/handler0alter.cc:10832
#14 0x000055c7649fadf9 in handler::ha_commit_inplace_alter_table (this=0x7fa0700faa00, altered_table=0x7fa0a8ca0910, ha_alter_info=0x7fa0a8ca0870, commit=true) at /data/src/10.5/sql/handler.cc:4843
#15 0x000055c7647918ae in mysql_inplace_alter_table (thd=0x7fa074000b18, table_list=0x7fa074012548, table=0x7fa07021b248, altered_table=0x7fa0a8ca0910, ha_alter_info=0x7fa0a8ca0870, inplace_supported=HA_ALTER_INPLACE_COPY_NO_LOCK, target_mdl_request=0x7fa0a8ca0ce0, alter_ctx=0x7fa0a8ca1830) at /data/src/10.5/sql/sql_table.cc:8038
#16 0x000055c7647990d9 in mysql_alter_table (thd=0x7fa074000b18, new_db=0x7fa0740053d8, new_name=0x7fa0740057e0, create_info=0x7fa0a8ca2430, table_list=0x7fa074012548, alter_info=0x7fa0a8ca2360, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/src/10.5/sql/sql_table.cc:10564
#17 0x000055c76483e97a in Sql_cmd_alter_table::execute (this=0x7fa074012c20, thd=0x7fa074000b18) at /data/src/10.5/sql/sql_alter.cc:538
#18 0x000055c76469da0c in mysql_execute_command (thd=0x7fa074000b18) at /data/src/10.5/sql/sql_parse.cc:5951
#19 0x000055c7646a3d4d in mysql_parse (thd=0x7fa074000b18, rawbuf=0x7fa074012470 "ALTER TABLE t1 FORCE", length=20, parser_state=0x7fa0a8ca3520, is_com_multi=false, is_next_command=false) at /data/src/10.5/sql/sql_parse.cc:7993
#20 0x000055c764690046 in dispatch_command (command=COM_QUERY, thd=0x7fa074000b18, packet=0x7fa0740088b9 "ALTER TABLE t1 FORCE", packet_length=20, is_com_multi=false, is_next_command=false) at /data/src/10.5/sql/sql_parse.cc:1867
#21 0x000055c76468e77e in do_command (thd=0x7fa074000b18) at /data/src/10.5/sql/sql_parse.cc:1348
#22 0x000055c76483407a in do_handle_one_connection (connect=0x55c7684c5278, put_in_cache=true) at /data/src/10.5/sql/sql_connect.cc:1410
#23 0x000055c764833de2 in handle_one_connection (arg=0x55c7684c5278) at /data/src/10.5/sql/sql_connect.cc:1312
#24 0x000055c764d72126 in pfs_spawn_thread (arg=0x55c7684c4ea8) at /data/src/10.5/storage/perfschema/pfs.cc:2201
#25 0x00007fa0b16f24a4 in start_thread (arg=0x7fa0a8ca4700) at pthread_create.c:456
#26 0x00007fa0af826d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
For the 10.5 failure that involves ALTER TABLE t1 FORCE, I believe that the issue is that we are wrongly allowing LOCK=NONE when rebuilding the table that is lacking a PRIMARY KEY. At the time of the assertion failure, the table is empty. I did not see any wrongdoing on the SQL layer. No unexpected DML transaction rollback or incorrect release of MDL.
The 10.1 failure is a little different, because the table is not empty. But, nevertheless, also for the ALTER TABLE t1 DROP b we should refuse any concurrent DML, because the table only defines the hidden primary key GEN_CLUST_INDEX(DB_ROW_ID).
elenst, did you ever repeat this with a table that defines a PRIMARY KEY or a UNIQUE INDEX solely on NOT NULL columns?
Marko Mäkelä
added a comment - For the 10.5 failure that involves ALTER TABLE t1 FORCE , I believe that the issue is that we are wrongly allowing LOCK=NONE when rebuilding the table that is lacking a PRIMARY KEY . At the time of the assertion failure, the table is empty. I did not see any wrongdoing on the SQL layer. No unexpected DML transaction rollback or incorrect release of MDL.
The 10.1 failure is a little different, because the table is not empty. But, nevertheless, also for the ALTER TABLE t1 DROP b we should refuse any concurrent DML, because the table only defines the hidden primary key GEN_CLUST_INDEX(DB_ROW_ID) .
elenst , did you ever repeat this with a table that defines a PRIMARY KEY or a UNIQUE INDEX solely on NOT NULL columns?
marko,
It is reproducible the same way (same assertion in the same line of 10.5) with this test case:
--source include/have_innodb.inc
CREATETABLE t1 (pk INTNOTNULLPRIMARYKEY, a INT) ENGINE=InnoDB;
XA BEGIN'x';
INSERTINTO t1 VALUES (1,1),(2,2);
--connect (con1,localhost,root,,test)
SET lock_wait_timeout= 1;
--send
ALTERTABLE t1 FORCE;
--connection default
--error ER_NO_SUCH_TABLE
UPDATEIGNORE xx SET x = 0;
--error ER_XAER_RMFAIL
FLUSH TABLES;
UPDATE t1 SET a = 3;
XA END'x';
XA ROLLBACK'x';
--connection con1
--reap
# Cleanup
--disconnect con1
--connection default
DROPTABLE t1;
Fails on 10.1 too.
Elena Stepanova
added a comment - - edited marko ,
It is reproducible the same way (same assertion in the same line of 10.5) with this test case:
--source include/have_innodb.inc
CREATE TABLE t1 (pk INT NOT NULL PRIMARY KEY , a INT ) ENGINE=InnoDB;
XA BEGIN 'x' ;
INSERT INTO t1 VALUES (1,1),(2,2);
--connect (con1,localhost,root,,test)
SET lock_wait_timeout= 1;
--send
ALTER TABLE t1 FORCE ;
--connection default
--error ER_NO_SUCH_TABLE
UPDATE IGNORE xx SET x = 0;
--error ER_XAER_RMFAIL
FLUSH TABLES;
UPDATE t1 SET a = 3;
XA END 'x' ;
XA ROLLBACK 'x' ;
--connection con1
--reap
# Cleanup
--disconnect con1
--connection default
DROP TABLE t1;
Fails on 10.1 too.
I filed MDEV-23310 for the bug that I think should be repeatable without involving XA transactions. But, in the end, I determined that there is no problem to allow an online table rebuild when no PRIMARY KEY exists. The operation is like any table rebuild where the PRIMARY KEY is being preserved.
That being said, it is more convenient to debug cases where an explicit PRIMARY KEY exists.
Marko Mäkelä
added a comment - - edited I filed MDEV-23310 for the bug that I think should be repeatable without involving XA transactions. But, in the end, I determined that there is no problem to allow an online table rebuild when no PRIMARY KEY exists. The operation is like any table rebuild where the PRIMARY KEY is being preserved.
That being said, it is more convenient to debug cases where an explicit PRIMARY KEY exists.
I can repeat the crash on 10.2 with the latest test case. It failed on the first try both with and without the -rr option. The log>tail.block contains the following records:
ROW_T_UPDATE(pk=1,DB_TRX_ID=0x504,DB_ROLL_PTR=(update),a=3)
ROW_T_UPDATE(pk=2,DB_TRX_ID=0x504,DB_ROLL_PTR=(update),a=3)
ROW_T_UPDATE(pk=2,DB_TRX_ID=0x504,DB_ROLL_PTR=(insert2),a=2)
ROW_T_UPDATE(pk=1,DB_TRX_ID=0x504,DB_ROLL_PTR=(insert1),a=1)
ROW_T_DELETE(pk=2,DB_TRX_ID=0x504,DB_ROLL_PTR=(insert2))
ROW_T_DELETE(pk=1,DB_TRX_ID=0x504,DB_ROLL_PTR=(insert1))
In this log, everything would be fine if also the initial two INSERT rows of the XA BEGIN 'x' transaction were present. In that case, the online ALTER would have copied the empty table and then applied the log for the concurrent rolled-back transaction, ending up with an empty table again.
Clearly, something is wrongly allowing the ALTER TABLE t1 FORCE to start while that transaction is active and accessing the table. As expected, the MDL was incorrectly released by a statement that returned an error. Maybe the erroneous UPDATE is needed as well, but the FLUSH TABLES is what will violate the design constraints:
10.2 3c3f172f17de217389bc51053c555a5d7e91ddff
Thread 29 hit Breakpoint 1, MDL_context::release_transactional_locks (
this=0x7f7900000e48) at /mariadb/10.2o/sql/mdl.cc:2843
/* No transaction control allowed in sub-statements. */
DBUG_ASSERT(! thd->in_sub_stmt);
if (!(thd->variables.option_bits & OPTION_GTID_BEGIN))
{
/* If commit fails, we should be able to reset the OK status. */
thd->get_stmt_da()->set_overwrite_status(true);
/* Commit the normal transaction if one is active. */
trans_commit_implicit(thd);
thd->get_stmt_da()->set_overwrite_status(false);
thd->mdl_context.release_transactional_locks();
}
}
Marko Mäkelä
added a comment - I can repeat the crash on 10.2 with the latest test case. It failed on the first try both with and without the - rr option. The log >tail.block contains the following records:
ROW_T_UPDATE(pk=1,DB_TRX_ID=0x504,DB_ROLL_PTR=(update),a=3)
ROW_T_UPDATE(pk=2,DB_TRX_ID=0x504,DB_ROLL_PTR=(update),a=3)
ROW_T_UPDATE(pk=2,DB_TRX_ID=0x504,DB_ROLL_PTR=(insert2),a=2)
ROW_T_UPDATE(pk=1,DB_TRX_ID=0x504,DB_ROLL_PTR=(insert1),a=1)
ROW_T_DELETE(pk=2,DB_TRX_ID=0x504,DB_ROLL_PTR=(insert2))
ROW_T_DELETE(pk=1,DB_TRX_ID=0x504,DB_ROLL_PTR=(insert1))
In this log, everything would be fine if also the initial two INSERT rows of the XA BEGIN 'x' transaction were present. In that case, the online ALTER would have copied the empty table and then applied the log for the concurrent rolled-back transaction, ending up with an empty table again.
Clearly, something is wrongly allowing the ALTER TABLE t1 FORCE to start while that transaction is active and accessing the table. As expected, the MDL was incorrectly released by a statement that returned an error. Maybe the erroneous UPDATE is needed as well, but the FLUSH TABLES is what will violate the design constraints:
10.2 3c3f172f17de217389bc51053c555a5d7e91ddff
Thread 29 hit Breakpoint 1, MDL_context::release_transactional_locks (
this=0x7f7900000e48) at /mariadb/10.2o/sql/mdl.cc:2843
2843 {
(rr) bt
#0 MDL_context::release_transactional_locks (this=0x7f7900000e48)
at /mariadb/10.2o/sql/mdl.cc:2843
#1 0x000055f2bce3ce1d in mysql_execute_command (thd=0x7f7900000d28)
at /mariadb/10.2o/sql/sql_parse.cc:6080
#2 0x000055f2bce33564 in mysql_parse (thd=0x7f7900000d28,
rawbuf=0x7f7900011970 "FLUSH TABLES", length=<optimized out>,
parser_state=0x7f796405d6d0, is_com_multi=<optimized out>,
is_next_command=false) at /mariadb/10.2o/sql/sql_parse.cc:7733
…
(rr) break ha_innobase::prepare_inplace_alter_table
Breakpoint 2 at 0x55f2bd1e8570: file /mariadb/10.2o/storage/innobase/handler/handler0alter.cc, line 5326.
(rr) when
Current event: 118228
(rr) c
Continuing.
[Switching to Thread 539985.540023]
Thread 3 hit Breakpoint 2, ha_innobase::prepare_inplace_alter_table (
The MDL is being released in a different piece of the code than in MDEV-15532 :
else if (stmt_causes_implicit_commit(thd, CF_IMPLICIT_COMMIT_END))
{
/* No transaction control allowed in sub-statements. */
DBUG_ASSERT(! thd->in_sub_stmt);
if (!(thd->variables.option_bits & OPTION_GTID_BEGIN))
{
/* If commit fails, we should be able to reset the OK status. */
thd->get_stmt_da()->set_overwrite_status( true );
/* Commit the normal transaction if one is active. */
trans_commit_implicit(thd);
thd->get_stmt_da()->set_overwrite_status( false );
thd->mdl_context.release_transactional_locks();
}
}
It was actually fixed with MDEV-15532.
It is still easily enough reproducible on a previous commit, but not since this one:
commit 828471cbf83774f4537a78551290b7a4a7f5d374
Author: Monty
Date: Mon Nov 30 15:29:32 2020 +0200
MDEV 15532 Assertion `!log->same_pk' failed in row_log_table_apply_delete
Elena Stepanova
added a comment - It was actually fixed with MDEV-15532 .
It is still easily enough reproducible on a previous commit, but not since this one:
commit 828471cbf83774f4537a78551290b7a4a7f5d374
Author: Monty
Date: Mon Nov 30 15:29:32 2020 +0200
MDEV 15532 Assertion `!log->same_pk' failed in row_log_table_apply_delete
New occurrence on 10.3: https://api.travis-ci.org/v3/job/361026822/log.txt