Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2.2, 10.3.0
Description
When MySQL 5.7.1 introduced WL#6326 to reduce contention on the non-leaf levels of B-trees, it introduced a new rw-lock mode SX (not conflicting with S, but conflicting with SX and X) and new rules to go with it.
A thread that is holding an dict_index_t::lock aka index->lock in SX mode is permitted to acquire non-leaf buf_block_t::lock aka block->lock X or SX mode, in monotonically descending order. That is, once the thread has acquired a block->lock, it is not allowed to acquire a lock on its parent or grandparent pages. Such ‘arbitrary-order access’ is only allowed when the thread acquired the index->lock in X mode upfront.
A customer encountered a repeatable hang when loading a dump into InnoDB while using multiple innodb_purge_threads (default: 4). The dump makes very heavy use of FOREIGN KEY constraints. By luck, it happened so that two purge worker threads (srv_worker_thread) deadlocked with each other. Both were operating on the index FOR_REF of the InnoDB internal table SYS_FOREIGN. One of them was legitimately holding index->lock S-latch and the root block->lock S-latch. The other srv_worker_thread had acquired index->lock SX-latch, root block->lock SX-latch, and a bunch of other latches, including the fil_space_t::latch for freeing some blocks and some leaf page latches. This other thread was inside 2 nested calls to btr_compress() and it was trying to reacquire the root block->lock in X mode, violating the WL#6326 protocol.
This violation led to a deadlock, because while S is compatible with SX and a thread can upgrade an SX lock to X when there are no conflicting requests, in this case there was a conflicting S lock held by the other purge worker thread.
So, the two purge worker threads are now deadlocked with each other. Both are holding dict_operation_lock S-latch to prevent concurrent DROP TABLE operations (it would be better architecture to acquire MDL on the currently processed table name). Their dict_operation_lock would block any future DDL operations, and it would also block the master thread which is from time to time performing InnoDB table cache eviction (a feature introduced in MySQL 5.6; see MySQL Bug#84424). The user experienced a hang during a subsequent CREATE TABLE.
Below is my detailed analysis of what the purge worker threads (Thread 4 and Thread 5) did, from the core dump that was helpfully provided by the customer.
Thread 5 is performing a BTR_PURGE_LEAF dive into the index FOR_REF of
SYS_FOREIGN. It successfully acquired the index S-latch, the root page
S-latch, and is now trying to acquire an S-latch on the page 0:196633
that Thread 16 is waiting an SX-latch for. As soon as it would get the
S-latch, it would find the next child page pointer, which is a pointer
to a leaf page, and then release the non-leaf page latches and the
index->lock.We do not know yet who is holding the X-latch on the block->lock.
Thread 4 is trying to acquire an X-latch on the root page of FOR_REF
of SYS_FOREIGN. It is blocked by Thread 5.Also, there is nothing really bad about Thread 16, other than it seems
to be unnecessarily acquiring a too strong latch (SX instead of S) on
the block->lock.So, we must find out the lock holder:
(gdb) p/x $13->lock
$16 = {lock_word = 0xffffffffdfffffff, waiters = 0x1, sx_recursive = 0x0,
writer_is_wait_ex = 0x0, writer_thread = 0x7fe7e3fff700,The writer_thread points to Thread 4 (Thread 0x7fe7e3fff700 (LWP 16824)).
Thread 4 seems to be holding an X-latch on a level-1 node pointer and
then trying to acquire the level-2 root page X-latch. This would be
fine if it were holding the index->lock X-latch. But it is only
holding the index->lock SX-latch. Let us look at its mini-transaction
memo:
(gdb) p/x (mtr_t*)0x7fe7e3ffe500
$17 = 0x7fe7e3ffe500
(gdb) p *(void**)$17.m_impl.m_memo.m_first_block.m_data@40
$19 = {0x7fec5723fdd0, 0x80, 0x7fec13cf5410, 0x4, 0x7fec13cfb3b0, 0x2,
0x7fea06a89060, 0x8, 0x7fea06a89858, 0x2, 0x7fea06a89060, 0x2,
0x7fea06a896c0, 0x2, 0x7fec57213520, 0x40, 0x7fec13cf55a8, 0x4,
0x7fec13cf5410, 0x8, 0x7fec13cfb3b0, 0x8, 0x7fec13cfb3b0, 0x2,
0x7fea06a89858, 0x2, 0x7fea06a896c0, 0x2, 0x7fec13cf5410, 0x8,
0x7fec13cfb3b0, 0x8, 0x7fec13cfb3b0, 0x2, 0x7fea06a89858, 0x2,
0x7fea06a896c0, 0x2, 0x7fec13cf5410, 0x8}The first element is the &index->lock of our index=0x7fec5723fc40 (the
index FOR_REF of SYS_FOREIGN). It is locked as MTR_MEMO_SX_LOCK=0x80.The second is MTR_MEMO_PAGE_SX_FIX(4) of the root page (0:47). Looks
OK so far. It looks a little suspicious OK that the memo contains
multiple references to this root page, in mode MTR_MEMO_BUF_FIX(8).
Let us ignore those. The next page is MTR_MEMO_PAGE_X_FIX(2) of our
interesting page (0:196633). I think that this access is still OK
according to the WL#6326 rules. (It is certainly wrong of Thread 4 to
go back to acquire the root page X-latch later.) Also this page
reoccurs in the memo as MTR_MEMO_BUF_FIX(8). Let me pretty-print the
memo, ignoring those entries:MTR_MEMO_SX_LOCK(index->lock) 0x7fec5723fdd0
MTR_MEMO_PAGE_SX_FIX(0:47, root, level-2) 0x7fec13cf5410
MTR_MEMO_PAGE_X_FIX(0:196633, level-1) 0x7fec13cfb3b0
MTR_MEMO_BUF_FIX(0:70787, leaf page) 0x7fea06a89060
MTR_MEMO_PAGE_X_FIX(0:70793, leaf page) 0x7fea06a89858It looks like there was no match on page 0:70787 and we proceeded to
its FIL_PAGE_NEXT=0x11489=70793.The memo goes on with
MTR_MEMO_PAGE_X_FIX(0:70787, (left sibling) leaf page) 0x7fea06a89060
MTR_MEMO_PAGE_X_FIX(0:70791, (right sibling) leaf page) 0x7fea06a896c0Latching the left non-leaf sibling should be OK while holding the
index->lock SX-latch. (Latching parent pages is forbidden.)
Latching the right sibling should always be OK.Then we have this:
MTR_MEMO_X_LOCK(fil_space->latch) on "innodb_system" 0x7fec57213520
I think that this is normal for freeing pages. After all, we are
executing btr_compress(), going to shrink the index tree.MTR_MEMO_PAGE_SX_FIX(0:0, allocation bitmap for pages 0..16383) 0x7fec13cf55a8
MTR_MEMO_BUF_FIX(0:47, root, level-2) 0x7fec13cf5410
MTR_MEMO_BUF_FIX(0:196633, level-1) 0x7fec13cfb3b0
MTR_MEMO_PAGE_X_FIX(0:196633, level-1) 0x7fec13cfb3b0 (ok, already X-latched)
MTR_MEMO_PAGE_X_FIX(0:70793, leaf page) 0x7fea06a89858 (ok, already X-latched)
MTR_MEMO_PAGE_X_FIX(0:70791, (right sibling) leaf page) 0x7fea06a896c0 (ok)
MTR_MEMO_BUF_FIX(0:47, root, level-2) 0x7fec13cf5410
MTR_MEMO_BUF_FIX(0:196633, level-1) 0x7fec13cfb3b0
MTR_MEMO_PAGE_X_FIX(0:196633, level-1) 0x7fec13cfb3b0 (ok, already X-latched)
MTR_MEMO_PAGE_X_FIX(0:70793, leaf page) 0x7fea06a89858 (ok, already X-latched)
MTR_MEMO_PAGE_X_FIX(0:70791, (right sibling) leaf page) 0x7fea06a896c0 (ok)
MTR_MEMO_BUF_FIX(0:47, root, level-2) 0x7fec13cf5410
It turns out that a questionable condition had been included in MySQL 5.7, to sometimes acquire the index->lock in too weak SX mode, instead of the appropriate X mode.
Because the decision to acquire the correct X mode lock depends on the transaction history list length and the number of pending read requests, the code might work correctly most of the time. Also, most ‘pessimistic’ operations do not involve multiple recursion steps to btr_compress(), like Thread 4 in our case:
#8 pfs_rw_lock_x_lock_func
|
#9 buf_page_get_gen
|
#10 btr_block_get_func
|
#11 btr_cur_search_to_nth_level
|
#12 btr_page_get_father_node_ptr_func
|
#13 btr_compress
|
#14 btr_cur_compress_if_useful
|
#15 btr_cur_pessimistic_delete
|
#16 btr_compress
|
#17 btr_cur_compress_if_useful
|
#18 btr_cur_pessimistic_delete
|
#19 row_purge_remove_sec_if_poss_tree
|
#20 row_purge_remove_sec_if_poss
|
#21 row_purge_del_mark
|
#22 row_purge_record_func
|
#23 row_purge
|
#24 row_purge_step
|
#25 que_thr_step
|
#26 que_run_threads_low
|
#27 que_run_threads
|
#28 srv_task_execute
|
#29 srv_worker_thread
|
There is some code in btr_cur_search_to_nth_level() that tries to acquire all possibly needed sibling page locks in advance. It appears that this logic is not working correctly in this case, because the root page was only locked in SX mode, not X mode.
While this deadlock occurred during purge, the BTR_LATCH_FOR_DELETE is also used in ROLLBACK. It is possible that this bug has been reported upstream in MySQL Bug#80919. Without a core dump or at least full stack traces of each thread, this cannot be confirmed or denied.
While diagnosing this issue, I noticed that the page flush code is now acquiring block->lock in SX mode instead of S mode. This is likely because certain fields in index root pages are protected by SX latch, not X latch. The page flushing must ensure that no concurrent modifications are possible even to the fields in index root pages.
The conservative fix would be to always acquire the index->lock in X mode for BTR_MODIFY_TREE operations. This will very likely degrade performance, because then the only cases for acquiring the index->lock in SX mode would seem to be operations on BLOBs.
Attachments
Issue Links
- causes
-
MDEV-25101 Assertion `!strcmp(field->name, "table_name")' failed in btr_node_ptr_max_size after adding index on InnoDB stats table
- Closed
- relates to
-
MDEV-20543 Latching order violation during B-tree operations
- Closed
-
MDEV-21509 Possible hang during purge of history, or transaction rollback
- Closed
-
MDEV-24673 btr_cur_pessimistic_delete() fails to shrink the index B-tree
- Open
-
MDEV-29835 Partial server freeze
- Closed
-
MDEV-13360 too long values in mysql.innodb_table_stats.table_name
- Stalled
-
MDEV-13935 INSERT INTO stuck at state "Unlocking tables"
- Closed
-
MDEV-15154 WSREP: BF lock wait long after a TRUNCATE TABLE
- Closed
-
MDEV-16845 Failure upon upgrade from 5.7.23 to 10.2 and 10.3, pagesize 4K
- Closed
-
MDEV-17815 Assertion `!strcmp(index->table->name.m_name, "SYS_FOREIGN") || !strcmp(index->table->name.m_name, "SYS_FOREIGN_COLS")' failed in btr_node_ptr_max_size upon reducing column length
- Closed
-
MDEV-17988 mysql_upgrade failed to revert innodb_index/table_stats tables column datatype length
- Closed
-
MDEV-19216 Assertion `!strcmp(index->table->name.m_name, "SYS_FOREIGN") || !strcmp(index->table->name.m_name, "SYS_FOREIGN_COLS")' failed in btr_node_ptr_max_size
- Closed