[MDEV-25902] Unexpected ER_LOCK_WAIT_TIMEOUT and result after timeout, assertion failure err != DB_DUPLICATE_KEY - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.3
Component/s: Locking, Partitioning, Storage Engine - InnoDB
Labels:
- regression

Description

--source include/have_partition.inc

--source include/have_innodb.inc

CREATE TABLE t1 (pk int primary key) ENGINE=InnoDB;

CREATE TABLE t2 (pk int primary key) ENGINE=InnoDB;

--connect (con1,localhost,root,,test)

START TRANSACTION;

INSERT INTO t2 (pk) VALUES (1);

SAVEPOINT sp;

INSERT INTO t1 (pk) VALUES (1);

ROLLBACK TO SAVEPOINT sp;

--connection default

SET innodb_lock_wait_timeout= 1;

--error ER_LOCK_WAIT_TIMEOUT

ALTER TABLE t1 PARTITION BY HASH(pk);

SHOW CREATE TABLE t1;

--connection con1

COMMIT;

--connection default

ALTER TABLE t2 PARTITION BY HASH(pk);

# Cleanup

--disconnect con1

--connection default

DROP TABLE t1, t2;

In the test case above, the problem starts at

10.6 6c39eaeb
--error ER_LOCK_WAIT_TIMEOUT
ALTER TABLE t1 PARTITION BY HASH(pk);
ERROR HY000: Lock wait timeout exceeded; try restarting transaction

It looks new, 10.5 doesn't time out here.

Further, even though the query fails, the partitioning is still performed:

ALTER TABLE t1 PARTITION BY HASH(pk);

ERROR HY000: Lock wait timeout exceeded; try restarting transaction

SHOW CREATE TABLE t1;

Table	Create Table

t1	CREATE TABLE `t1` (

  `pk` int(11) NOT NULL,

  PRIMARY KEY (`pk`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

 PARTITION BY HASH (`pk`)

And then, further partitioning on a different table returns an unexpected error (on non-debug) or a debug assertion failure:

mysqltest: At line 26: query 'ALTER TABLE t2 PARTITION BY HASH(pk)' failed: ER_TABLE_EXISTS_ERROR (1050): Table './test/#sql-backup-3ea389-4' already exists

Please note that the assertion failure is just an aftermath, the problems start long before it.

10.6 6c39eaeb
mariadbd: /data/src/10.6/storage/innobase/row/row0mysql.cc:2962: dberr_t row_rename_table_for_mysql(const char, const char, trx_t*, bool): Assertion `err != DB_DUPLICATE_KEY' failed.
210612 20:24:56 [ERROR] mysqld got signal 6 ;

#7 0x00007f81ceefef36 in __GI___assert_fail (assertion=0x559b433158a5 "err != DB_DUPLICATE_KEY", file=0x559b43313c08 "/data/src/10.6/storage/innobase/row/row0mysql.cc", line=2962, function=0x559b43315778 "dberr_t row_rename_table_for_mysql(const char, const char, trx_t*, bool)") at assert.c:101
#8 0x0000559b42b901f5 in row_rename_table_for_mysql (old_name=0x7f81c8141ca0 "test/t2", new_name=0x7f81c8141aa0 "test/#sql-backup-3ea506-4", trx=0x7f81c93f33c0, use_fk=true) at /data/src/10.6/storage/innobase/row/row0mysql.cc:2962
#9 0x0000559b429f6814 in innobase_rename_table (trx=0x7f81c93f33c0, from=0x7f81c81427e0 "./test/t2", to=0x7f81c81429e0 "./test/#sql-backup-3ea506-4", use_fk=true) at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:13423
#10 0x0000559b429de12f in ha_innobase::rename_table (this=0x7f819c018288, from=0x7f81c81427e0 "./test/t2", to=0x7f81c81429e0 "./test/#sql-backup-3ea506-4") at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:13671
#11 0x0000559b425542af in handler::ha_rename_table (this=0x7f819c018288, from=0x7f81c81427e0 "./test/t2", to=0x7f81c81429e0 "./test/#sql-backup-3ea506-4") at /data/src/10.6/sql/handler.cc:5270
#12 0x0000559b422bcfb1 in mysql_rename_table (base=0x559b46095668, old_db=0x7f81c81445e0, old_name=0x7f81c81445f0, new_db=0x7f81c81445e0, new_name=0x7f81c8143140, id=0x7f81c8144680, flags=2) at /data/src/10.6/sql/sql_table.cc:4972
#13 0x0000559b422cd6a1 in mysql_alter_table (thd=0x7f819c000db8, new_db=0x7f819c0059c8, new_name=0x7f819c005de0, create_info=0x7f81c81453e0, table_list=0x7f819c0156b0, alter_info=0x7f81c81452f0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/src/10.6/sql/sql_table.cc:10521
#14 0x0000559b42381fad in Sql_cmd_alter_table::execute (this=0x7f819c0160f8, thd=0x7f819c000db8) at /data/src/10.6/sql/sql_alter.cc:550
#15 0x0000559b421c9562 in mysql_execute_command (thd=0x7f819c000db8) at /data/src/10.6/sql/sql_parse.cc:5983
#16 0x0000559b421cf708 in mysql_parse (thd=0x7f819c000db8, rawbuf=0x7f819c0155c0 "ALTER TABLE t2 PARTITION BY HASH(pk)", length=36, parser_state=0x7f81c8146480) at /data/src/10.6/sql/sql_parse.cc:8016
#17 0x0000559b421bbb92 in dispatch_command (command=COM_QUERY, thd=0x7f819c000db8, packet=0x7f819c00b879 "", packet_length=36, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1897
#18 0x0000559b421ba537 in do_command (thd=0x7f819c000db8, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1406
#19 0x0000559b42376d71 in do_handle_one_connection (connect=0x559b465ec328, put_in_cache=true) at /data/src/10.6/sql/sql_connect.cc:1410
#20 0x0000559b42376acd in handle_one_connection (arg=0x559b466152a8) at /data/src/10.6/sql/sql_connect.cc:1312
#21 0x0000559b428e5ed9 in pfs_spawn_thread (arg=0x559b465ebe78) at /data/src/10.6/storage/perfschema/pfs.cc:2201
#22 0x00007f81cf416609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#23 0x00007f81cefea293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Attachments

Issue Links

is caused by

MDEV-25506 Atomic DDL: .frm file is removed and orphan InnoDB tablespace is left behind upon crash recovery

Closed

is duplicated by

MDEV-24581 MDL race condition in trans_rollback_to_savepoint()

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2021-06-12 17:40

This is probably due to ~~MDEV-25506~~ part 3, which refactored DDL operations that could involve deleting .ibd files or internally dropping tables (such as an old copy of a table in ALTER TABLE). We will first lock all tables, then lock the data dictionary, and finally attempt to modify the dictionary tables and commit. Should a further lock conflict occur during the final phase of modifying the dictionary tables, we will immediately fail with a lock wait timeout error. It looks like the assertion needs to be relaxed due to this.

Marko Mäkelä added a comment - 2021-06-12 17:40 This is probably due to MDEV-25506 part 3, which refactored DDL operations that could involve deleting .ibd files or internally dropping tables (such as an old copy of a table in ALTER TABLE ). We will first lock all tables, then lock the data dictionary, and finally attempt to modify the dictionary tables and commit. Should a further lock conflict occur during the final phase of modifying the dictionary tables, we will immediately fail with a lock wait timeout error. It looks like the assertion needs to be relaxed due to this.

Marko Mäkelä added a comment - 2021-06-14 07:25

The timeout is reported while ALTER TABLE t1 PARTITION BY HASH(pk) is dropping the original table (which carries a #sql-backup- name). The subsequent ALTER TABLE t2 would attempt to use the same #sql-backup- name, even though the previous ALTER TABLE t1 failed to drop that table.

The timeout occurs due to a lock conflict on table t1. The conflicting lock had been created by the transaction in con1. The ROLLBACK TO SAVEPOINT will not release the lock on table t1 inside InnoDB. InnoDB was not even asked about that: there was no call to innobase_rollback_to_savepoint_can_release_mdl() before ALTER TABLE t1 PARTITION BY HASH(pk) was executed.

This is a metadata locking (MDL) bug that likely was worked around by the background DROP TABLE queue that ~~MDEV-25506~~ part 3 removed. The MDL for t1 was wrongly released here, without even consulting InnoDB:

10.6 82c07b178abfa9395f1d67747148a431d6eb6ae7
(rr) bt
#0 MDL_context::release_lock (this=0x7f02c4001c28, duration=MDL_TRANSACTION, ticket=0x7f02c4922b40)
at /mariadb/10.6/sql/mdl.cc:2815
#1 0x0000555f98026220 in MDL_context::release_locks_stored_before (this=0x7f02c4001c28,
duration=MDL_TRANSACTION, sentinel=0x7f02c4013ce0) at /mariadb/10.6/sql/mdl.cc:2873
#2 MDL_context::rollback_to_savepoint (this=0x7f02c4001c28, mdl_savepoint=
@0x7f02c401a0d0: {m_stmt_ticket = 0x0, m_trans_ticket = 0x7f02c4013ce0})
at /mariadb/10.6/sql/mdl.cc:3037
#3 0x0000555f9802e318 in trans_rollback_to_savepoint (thd=thd@entry=0x7f02c4001af8,
name=<optimized out>) at /mariadb/10.6/sql/transaction.cc:709
#4 0x0000555f97eeb65c in mysql_execute_command (thd=thd@entry=0x7f02c4001af8)
at /mariadb/10.6/sql/sql_parse.cc:5680
#5 0x0000555f97ee5e5e in mysql_parse (thd=thd@entry=0x7f02c4001af8, rawbuf=<optimized out>,
length=<optimized out>, parser_state=<optimized out>, parser_state@entry=0x7f02e406c580)
at /mariadb/10.6/sql/sql_parse.cc:8016
#6 0x0000555f97ee375a in dispatch_command (command=<optimized out>, command@entry=COM_QUERY,
thd=thd@entry=0x7f02c4001af8, packet=packet@entry=0x7f02c400bc99 "ROLLBACK TO SAVEPOINT sp",
packet_length=3288416368, packet_length@entry=24, blocking=true)
at /mariadb/10.6/sql/sql_parse.cc:1897

Marko Mäkelä added a comment - 2021-06-14 07:25 The timeout is reported while ALTER TABLE t1 PARTITION BY HASH(pk) is dropping the original table (which carries a #sql-backup- name). The subsequent ALTER TABLE t2 would attempt to use the same #sql-backup- name, even though the previous ALTER TABLE t1 failed to drop that table. The timeout occurs due to a lock conflict on table t1 . The conflicting lock had been created by the transaction in con1 . The ROLLBACK TO SAVEPOINT will not release the lock on table t1 inside InnoDB. InnoDB was not even asked about that: there was no call to innobase_rollback_to_savepoint_can_release_mdl() before ALTER TABLE t1 PARTITION BY HASH(pk) was executed. This is a metadata locking (MDL) bug that likely was worked around by the background DROP TABLE queue that MDEV-25506 part 3 removed. The MDL for t1 was wrongly released here, without even consulting InnoDB: 10.6 82c07b178abfa9395f1d67747148a431d6eb6ae7 (rr) bt #0 MDL_context::release_lock (this=0x7f02c4001c28, duration=MDL_TRANSACTION, ticket=0x7f02c4922b40) at /mariadb/10.6/sql/mdl.cc:2815 #1 0x0000555f98026220 in MDL_context::release_locks_stored_before (this=0x7f02c4001c28, duration=MDL_TRANSACTION, sentinel=0x7f02c4013ce0) at /mariadb/10.6/sql/mdl.cc:2873 #2 MDL_context::rollback_to_savepoint (this=0x7f02c4001c28, mdl_savepoint= @0x7f02c401a0d0: {m_stmt_ticket = 0x0, m_trans_ticket = 0x7f02c4013ce0}) at /mariadb/10.6/sql/mdl.cc:3037 #3 0x0000555f9802e318 in trans_rollback_to_savepoint (thd=thd@entry=0x7f02c4001af8, name=<optimized out>) at /mariadb/10.6/sql/transaction.cc:709 #4 0x0000555f97eeb65c in mysql_execute_command (thd=thd@entry=0x7f02c4001af8) at /mariadb/10.6/sql/sql_parse.cc:5680 #5 0x0000555f97ee5e5e in mysql_parse (thd=thd@entry=0x7f02c4001af8, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, parser_state@entry=0x7f02e406c580) at /mariadb/10.6/sql/sql_parse.cc:8016 #6 0x0000555f97ee375a in dispatch_command (command=<optimized out>, command@entry=COM_QUERY, thd=thd@entry=0x7f02c4001af8, packet=packet@entry=0x7f02c400bc99 "ROLLBACK TO SAVEPOINT sp", packet_length=3288416368, packet_length@entry=24, blocking=true) at /mariadb/10.6/sql/sql_parse.cc:1897

Elena Stepanova added a comment - 2021-06-28 14:20

Raising to a blocker not as much due to the assertion failure but because of ALTER being performed even though it returns an error.

Elena Stepanova added a comment - 2021-06-28 14:20 Raising to a blocker not as much due to the assertion failure but because of ALTER being performed even though it returns an error.

Marko Mäkelä added a comment - 2021-06-30 15:44

Would you agree with my proposed fix? With that, we would never release any MDL after ROLLBACK TO SAVEPOINT if the transaction had acquired any locks in InnoDB. Only in that way can the MDL perform its intended task of detecting conflicts between DML and DDL.

I think that a follow-up performance improvement could be filed, to enhance the handlerton::savepoint_rollback_can_release_mdl interface, so that the storage engines can indicate which MDL are OK to release.

Marko Mäkelä added a comment - 2021-06-30 15:44 Would you agree with my proposed fix ? With that, we would never release any MDL after ROLLBACK TO SAVEPOINT if the transaction had acquired any locks in InnoDB. Only in that way can the MDL perform its intended task of detecting conflicts between DML and DDL. I think that a follow-up performance improvement could be filed, to enhance the handlerton::savepoint_rollback_can_release_mdl interface, so that the storage engines can indicate which MDL are OK to release.

Sergei Golubchik added a comment - 2021-07-01 11:54

commit 161e4bfafd2 is ok to push

Sergei Golubchik added a comment - 2021-07-01 11:54 commit 161e4bfafd2 is ok to push

MariaDB Server

Unexpected ER_LOCK_WAIT_TIMEOUT and result after timeout, assertion failure err != DB_DUPLICATE_KEY

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration