[MDEV-14637] Latching order violation during btr_cur_pessimistic_delete() Created: 2017-12-13 Updated: 2023-03-17 Resolved: 2018-08-03 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2.2, 10.3.0 |
| Fix Version/s: | 10.2.17, 10.3.9 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | upstream | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
When MySQL 5.7.1 introduced WL#6326 to reduce contention on the non-leaf levels of B-trees, it introduced a new rw-lock mode SX (not conflicting with S, but conflicting with SX and X) and new rules to go with it. A thread that is holding an dict_index_t::lock aka index->lock in SX mode is permitted to acquire non-leaf buf_block_t::lock aka block->lock X or SX mode, in monotonically descending order. That is, once the thread has acquired a block->lock, it is not allowed to acquire a lock on its parent or grandparent pages. Such ‘arbitrary-order access’ is only allowed when the thread acquired the index->lock in X mode upfront. A customer encountered a repeatable hang when loading a dump into InnoDB while using multiple innodb_purge_threads (default: 4). The dump makes very heavy use of FOREIGN KEY constraints. By luck, it happened so that two purge worker threads (srv_worker_thread) deadlocked with each other. Both were operating on the index FOR_REF of the InnoDB internal table SYS_FOREIGN. One of them was legitimately holding index->lock S-latch and the root block->lock S-latch. The other srv_worker_thread had acquired index->lock SX-latch, root block->lock SX-latch, and a bunch of other latches, including the fil_space_t::latch for freeing some blocks and some leaf page latches. This other thread was inside 2 nested calls to btr_compress() and it was trying to reacquire the root block->lock in X mode, violating the WL#6326 protocol. This violation led to a deadlock, because while S is compatible with SX and a thread can upgrade an SX lock to X when there are no conflicting requests, in this case there was a conflicting S lock held by the other purge worker thread. So, the two purge worker threads are now deadlocked with each other. Both are holding dict_operation_lock S-latch to prevent concurrent DROP TABLE operations (it would be better architecture to acquire MDL on the currently processed table name). Their dict_operation_lock would block any future DDL operations, and it would also block the master thread which is from time to time performing InnoDB table cache eviction (a feature introduced in MySQL 5.6; see MySQL Bug#84424). The user experienced a hang during a subsequent CREATE TABLE. Below is my detailed analysis of what the purge worker threads (Thread 4 and Thread 5) did, from the core dump that was helpfully provided by the customer.
It turns out that a questionable condition had been included in MySQL 5.7, to sometimes acquire the index->lock in too weak SX mode, instead of the appropriate X mode. Because the decision to acquire the correct X mode lock depends on the transaction history list length and the number of pending read requests, the code might work correctly most of the time. Also, most ‘pessimistic’ operations do not involve multiple recursion steps to btr_compress(), like Thread 4 in our case:
There is some code in btr_cur_search_to_nth_level() that tries to acquire all possibly needed sibling page locks in advance. It appears that this logic is not working correctly in this case, because the root page was only locked in SX mode, not X mode. While this deadlock occurred during purge, the BTR_LATCH_FOR_DELETE is also used in ROLLBACK. It is possible that this bug has been reported upstream in MySQL Bug#80919. Without a core dump or at least full stack traces of each thread, this cannot be confirmed or denied. While diagnosing this issue, I noticed that the page flush code is now acquiring block->lock in SX mode instead of S mode. This is likely because certain fields in index root pages are protected by SX latch, not X latch. The page flushing must ensure that no concurrent modifications are possible even to the fields in index root pages. The conservative fix would be to always acquire the index->lock in X mode for BTR_MODIFY_TREE operations. This will very likely degrade performance, because then the only cases for acquiring the index->lock in SX mode would seem to be operations on BLOBs. |
| Comments |
| Comment by Axel Schwenke [ 2017-12-19 ] |
|
I have run a benchmark, comparing 10.2.11 and this branch for various write-only workload. For 16K page size (default) this branch is about same speed for INSERT and clearly slower (~20%) for UPDATE. With 4K page size it is slower for both INSERT (~5%) and UPDATE (~10%). Spread sheet with numbers attached. |
| Comment by Marko Mäkelä [ 2018-07-30 ] |
|
The MySQL 5.7.23 release on Friday, July 27, 2018 finally published a fix that was apparently made almost 4 months earlier: |
| Comment by Marko Mäkelä [ 2018-07-31 ] |
|
I reviewed the fix in MySQL 5.7.23. The root cause is not documented there, and neither is there a test case, but it seems to me that this only affects the purge (or rollback) on the tables SYS_FOREIGN and SYS_FOREIGN_COLS. That is, you would need some DDL operation on tables that contain FOREIGN KEY constraints. The purge should be triggered after DROP TABLE, RENAME TABLE, ALTER TABLE…ALGORITHM=COPY at least. Repeating this would require some bad luck (such as two purge threads operating on the same system table, and one of them deciding to merge a non-leaf page). The SYS_FOREIGN and SYS_FOREIGN_COLS tables are special in that they define key columns as the InnoDB internal SQL parser type CHAR, which translates to VARCHAR(0) (at most 0 bytes). However, InnoDB happily ignores the maximum size when inserting or updating records (in MariaDB or MySQL, it is the user SQL layer that enforces the limit). The wrong maximum length leads to a miscalculation of node_ptr_max_size. It might be possible to encounter this deadlock for other tables as well. The patch is addressing that as well. This could be lead to a hang during rollback or purge. |
| Comment by Marko Mäkelä [ 2018-07-31 ] |
|
Other tables with insufficient maximum length for keys that are directly written by InnoDB are the persistent statistics tables mysql.innodb_index_stats and mysql.innodb_table_stats. As noted in |
| Comment by Marko Mäkelä [ 2018-07-31 ] |
|
Increasing the column length would cause a bootstrap failure when using innodb_page_size=4k, as noted in
In MariaDB, I made ha_innobase::max_supported_key_length() return a longer limit for all tables when using innodb_page_size=4k. This could allow tables to be created such that some INSERT or UPDATE operations could fail due to too long records being inserted into secondary indexes. I think that such failures are already possible, especially related to an InnoDB undo log format limitation when involving column prefix indexes on TEXT columns. The MariaDB fix is also tweaking the node_ptr_max_size calculation so that a larger size will be used for table_name in these tables, no matter with which size they were created. |
| Comment by Marko Mäkelä [ 2023-03-17 ] |
|
This bug was not completely fixed until |