[MDEV-25919] InnoDB reports misleading lock wait timeout on DDL operations Created: 2021-06-15 Updated: 2024-01-05 Resolved: 2021-08-31 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking, Storage Engine - InnoDB |
| Affects Version/s: | 10.6.2 |
| Fix Version/s: | 10.6.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | regression | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
In the third and last part of As part of that refactoring, the lock conflict handling of internal transactions was revised so that if the data dictionary is being locked, any conflict would result in an immediate lock wait timeout error. Such lock conflicts should only be able to exist on the InnoDB persistent statistics tables or some internal tables that are related to the InnoDB FULLTEXT INDEX implementation. We should try to abandon the concept of 'dictionary transaction' and refactor the InnoDB internal SQL parser so that all table handles will be looked up before the parser is invoked. In this way, no data dictionary latch will be required during the execution, and lock waits can be handled in the normal fashion, without fearing any server hang (such as Making all transactions equal will remove the need to use separate internal transactions for InnoDB DDL operations. Note: Before |
| Comments |
| Comment by Marko Mäkelä [ 2021-07-27 ] | ||||||||||||||||
|
After all, it can be useful to keep the concept of ‘dictionary transaction’, so that such transactions can be rolled back during an early phase of startup. Other transactions that do not hold locks on dictionary tables can be rolled back in the background while the server is already accepting connections. What really needs to change is that the current thread must not hold any dict_sys latch when attempting to acquire an InnoDB table or record lock. We must remove any table lookup from pars_sql() and always pass the table names via pars_bound_id_t. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-27 ] | ||||||||||||||||
|
As part of this fix, we may reduce the use of trx_t::dict_operation_lock_mode. Also | ||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-27 ] | ||||||||||||||||
|
Random failures of the test parts.partition_special_innodb were causing some headache until I realized that
will invoke ha_innobase::delete_table() without holding MDL_EXCLUSIVE. Therefore, background tasks that might access the newly created partitions must explicitly be disabled in this case; we cannot rely on MDL. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-31 ] | ||||||||||||||||
|
We also removed dict_table_t::stats_bg_flag with proper use of MDL. The background tasks that would update persistent statistics for InnoDB tables or defragment them will acquire MDL on the table name, which will ensure that no DDL may be executed concurrently on the tables. The refactored waiting logic should essentially fix | ||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-24 ] | ||||||||||||||||
|
For the record, I just got this test to fail once on a 10.5-based branch. I had added the test into my local development tree months ago, and only now it failed for the first time:
The failure looked like this:
|