[MDEV-25919] InnoDB reports misleading lock wait timeout on DDL operations - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6.2
Fix Version/s: 10.6.5
Component/s: Locking, Storage Engine - InnoDB
Labels:
- regression

Description

In the third and last part of ~~MDEV-25506~~, operations that delete .ibd files were refactored so that the deletion would take place after transaction commit.

As part of that refactoring, the lock conflict handling of internal transactions was revised so that if the data dictionary is being locked, any conflict would result in an immediate lock wait timeout error. Such lock conflicts should only be able to exist on the InnoDB persistent statistics tables or some internal tables that are related to the InnoDB FULLTEXT INDEX implementation.

We should try to abandon the concept of 'dictionary transaction' and refactor the InnoDB internal SQL parser so that all table handles will be looked up before the parser is invoked. In this way, no data dictionary latch will be required during the execution, and lock waits can be handled in the normal fashion, without fearing any server hang (such as ~~MDEV-15020~~).

Making all transactions equal will remove the need to use separate internal transactions for InnoDB DDL operations.

Note: Before ~~MDEV-24589~~ and ~~MDEV-25506~~, some InnoDB DDL operations were neither atomic nor rollback-safe, and mixing DDL and DML in one transaction would have been out of the question.

Attachments

Issue Links

blocks

MDEV-18381 Server crashes in ha_innobase::store_lock upon concurrent ALTER and TRUNCATE

Closed

MDEV-25920 Atomic DROP DATABASE

Open

MDEV-25921 Implement CREATE TABLE...SELECT in one transaction

Open

causes

MDEV-26657 SHOW ENGINE INNODB STATUS displays uninitialized THD::real_id

Closed

MDEV-26772 InnoDB DDL fails with DUPLICATE KEY error

Closed

MDEV-26879 innodb_evict_tables_on_commit_debug=on makes table creation hang

Closed

MDEV-27069 heap-use-after-free in dict_stats_recalc_pool_del()

Closed

MDEV-27469 Assertion `!trx->read_only' failed in trx_undo_report_row_operation on CREATE TABLE

Closed

MDEV-27805 tpcc workload shows regression with MDB-10.6

Closed

MDEV-27909 InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED || (relaxed && thd_get_error_number(trx->mysql_thd))

Closed

MDEV-35647 Possible hang during CREATE TABLE…SELECT error handling

Closed

is blocked by

MDEV-24258 Merge dict_sys.mutex into dict_sys.latch

Closed

is caused by

MDEV-25506 Atomic DDL: .frm file is removed and orphan InnoDB tablespace is left behind upon crash recovery

Closed

is duplicated by

MDEV-26490 Assertion `!srv_read_only_mode' failed when using innodb-read-only=1 as a startup option

Closed

relates to

MDEV-19505 Do not hold mutex while calling que_graph_free()

Closed

MDEV-23484 Rollback unnecessarily acquires dict_operation_lock for every row

Closed

MDEV-23670 Crash during OPTIMIZE TABLE mysql.innodb_table_stats

Closed

MDEV-31979 Assertion `!internal' failed in void trx_t::free()

Open

MDEV-24579 Error table->get_ref_count() after update to 10.5.8

Closed

(6 causes, 1 is blocked by, 1 is caused by, 1 is duplicated by, 5 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2021-07-27 12:31

After all, it can be useful to keep the concept of ‘dictionary transaction’, so that such transactions can be rolled back during an early phase of startup. Other transactions that do not hold locks on dictionary tables can be rolled back in the background while the server is already accepting connections.

What really needs to change is that the current thread must not hold any dict_sys latch when attempting to acquire an InnoDB table or record lock. We must remove any table lookup from pars_sql() and always pass the table names via pars_bound_id_t.

Marko Mäkelä added a comment - 2021-07-27 12:31 After all, it can be useful to keep the concept of ‘dictionary transaction’, so that such transactions can be rolled back during an early phase of startup. Other transactions that do not hold locks on dictionary tables can be rolled back in the background while the server is already accepting connections. What really needs to change is that the current thread must not hold any dict_sys latch when attempting to acquire an InnoDB table or record lock. We must remove any table lookup from pars_sql() and always pass the table names via pars_bound_id_t .

Marko Mäkelä added a comment - 2021-07-27 14:06

As part of this fix, we may reduce the use of trx_t::dict_operation_lock_mode. Also ~~MDEV-23484~~ would reduce the use of the field. Maybe the field can be removed altogether.

Marko Mäkelä added a comment - 2021-07-27 14:06 As part of this fix, we may reduce the use of trx_t::dict_operation_lock_mode . Also MDEV-23484 would reduce the use of the field. Maybe the field can be removed altogether.

Marko Mäkelä added a comment - 2021-08-27 14:09

Random failures of the test parts.partition_special_innodb were causing some headache until I realized that

--error ER_LOCK_WAIT_TIMEOUT

ALTER TABLE t1 ADD PARTITION PARTITIONS 2;

will invoke ha_innobase::delete_table() without holding MDL_EXCLUSIVE. Therefore, background tasks that might access the newly created partitions must explicitly be disabled in this case; we cannot rely on MDL.

Marko Mäkelä added a comment - 2021-08-27 14:09 Random failures of the test parts.partition_special_innodb were causing some headache until I realized that --error ER_LOCK_WAIT_TIMEOUT ALTER TABLE t1 ADD PARTITION PARTITIONS 2; will invoke ha_innobase::delete_table() without holding MDL_EXCLUSIVE . Therefore, background tasks that might access the newly created partitions must explicitly be disabled in this case; we cannot rely on MDL.

Marko Mäkelä added a comment - 2021-08-31 12:02

We also removed dict_table_t::stats_bg_flag with proper use of MDL. The background tasks that would update persistent statistics for InnoDB tables or defragment them will acquire MDL on the table name, which will ensure that no DDL may be executed concurrently on the tables.

The refactored waiting logic should essentially fix ~~MDEV-15020~~.

Marko Mäkelä added a comment - 2021-08-31 12:02 We also removed dict_table_t::stats_bg_flag with proper use of MDL. The background tasks that would update persistent statistics for InnoDB tables or defragment them will acquire MDL on the table name, which will ensure that no DDL may be executed concurrently on the tables. The refactored waiting logic should essentially fix MDEV-15020 .

Marko Mäkelä added a comment - 2021-09-24 05:47

For the record, I just got this test to fail once on a 10.5-based branch. I had added the test into my local development tree months ago, and only now it failed for the first time:

--source include/have_innodb.inc

CREATE TABLE t (a INT) ENGINE=InnoDB STATS_PERSISTENT=1;

--source include/restart_mysqld.inc

connect (con,localhost,root);

send ALTER TABLE mysql.innodb_index_stats FORCE, ALGORITHM=INPLACE;

connection default;

SELECT * FROM t;

connection con;

reap;

connection default;

DROP TABLE t;

The failure looked like this:

10.5 7d360060cb6ea2558820b343ef77d59584b0b805
mysqltest: At line 8: query 'SELECT * FROM t' failed: 2013: Lost connection to MySQL server during query
…
mariadbd: /mariadb/10.5m/storage/innobase/handler/handler0alter.cc:10008: bool commit_try_rebuild(Alter_inplace_info , ha_innobase_inplace_ctx , TABLE , const TABLE , trx_t , const char ): Assertion `user_table->get_ref_count() == 1' failed.

Marko Mäkelä added a comment - 2021-09-24 05:47 For the record, I just got this test to fail once on a 10.5-based branch. I had added the test into my local development tree months ago, and only now it failed for the first time: --source include/have_innodb.inc CREATE TABLE t (a INT ) ENGINE=InnoDB STATS_PERSISTENT=1; --source include/restart_mysqld.inc connect (con,localhost,root); send ALTER TABLE mysql.innodb_index_stats FORCE , ALGORITHM=INPLACE; connection default ; SELECT * FROM t; connection con; reap; connection default ; DROP TABLE t; The failure looked like this: 10.5 7d360060cb6ea2558820b343ef77d59584b0b805 mysqltest: At line 8: query 'SELECT * FROM t' failed: 2013: Lost connection to MySQL server during query … mariadbd: /mariadb/10.5m/storage/innobase/handler/handler0alter.cc:10008: bool commit_try_rebuild(Alter_inplace_info *, ha_innobase_inplace_ctx *, TABLE *, const TABLE *, trx_t *, const char *): Assertion `user_table->get_ref_count() == 1' failed.

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2021-06-15 05:48

Updated:: 2025-01-08 11:29

Resolved:: 2021-08-31 12:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server