Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1, 11.0.0, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3
Description
This InnoDB deadlock came up while mleich was testing MDEV-26055 and MDEV-26827. Many threads are blocked waiting on a fil_space_t::latch. The actual deadlock seems to involve buffer page latches and the tablespace latch in at least 3 threads. Two of the threads are attempting to update InnoDB statistics, waiting for a buffer page latch while holding shared fil_space_t::latch:
#7 0x0000558ac484ce42 in buf_page_get_gen (page_id=..., zip_size=zip_size@entry=0, rw_latch=rw_latch@entry=4, guess=guess@entry=0x0, mode=mode@entry=16, mtr=mtr@entry=0x7f98ea884950, err=0x0, allow_ibuf_merge=false) at /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827A/storage/innobase/buf/buf0buf.cc:2952
|
#8 0x0000558ac49e2e18 in fseg_inode_try_get (header=header@entry=0x7f9913bb4054 "", space=<optimized out>, zip_size=<optimized out>, mtr=mtr@entry=0x7f98ea884950, block=block@entry=0x7f98ea884740, err=err@entry=0x0) at /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827A/storage/innobase/include/buf0types.h:95
|
#9 0x0000558ac49e6cf0 in fseg_n_reserved_pages (block=..., header=header@entry=0x7f9913bb4054 "", used=used@entry=0x7f98ea884850, mtr=mtr@entry=0x7f98ea884950) at /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827A/storage/innobase/include/buf0types.h:131
|
#10 0x0000558ac4976846 in dict_stats_update_transient_for_index (index=index@entry=0x616004011a08) at /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827A/storage/innobase/include/fil0fil.h:1072
|
One thread is waiting for an exclusive tablespace latch while holding several buffer page latches:
#5 mtr_t::x_lock_space (this=this@entry=0x7f98eb9e1d60, space=space@entry=0x6120000523c0) at /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827A/storage/innobase/mtr/mtr0mtr.cc:521
|
#6 0x0000558ac49eb144 in fsp_reserve_free_extents (n_reserved=n_reserved@entry=0x7f98eb9e1750, space=0x6120000523c0, n_ext=<optimized out>, alloc_type=alloc_type@entry=FSP_NORMAL, mtr=mtr@entry=0x7f98eb9e1d60, n_pages=n_pages@entry=2) at /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827A/storage/innobase/fsp/fsp0fsp.cc:2398
|
#7 0x0000558ac47be2ca in btr_cur_pessimistic_insert (flags=flags@entry=0, cursor=cursor@entry=0x7f98eb9e1960, offsets=offsets@entry=0x7f98eb9e18e0, heap=heap@entry=0x7f98eb9e18c0, entry=entry@entry=0x616003f5da08, rec=rec@entry=0x7f98eb9e1900, big_rec=<optimized out>, n_ext=0, thr=<optimized out>, mtr=<optimized out>) at /data/Server/bb-10.6-MDEV-26055-2-MDEV-26827A/storage/innobase/include/dict0mem.h:1216
|
I see that before MDEV-29883, dict_stats_update_transient_for_index() used to hold an exclusive tablespace latch, instead of a shared one. It seems to me that this deadlock would have bene possible even without that change. Possibly the probability of the deadlock could have been increased.
I also see that before MDEV-29883, dict_stats_update_transient_for_index() only used to hold a shared index latch. It would seem to me that acquiring an even stronger index latch would prevent the observed deadlock:
diff --git a/storage/innobase/dict/dict0stats.cc b/storage/innobase/dict/dict0stats.cc
|
index 0d630583daa..101424d0f44 100644
|
--- a/storage/innobase/dict/dict0stats.cc
|
+++ b/storage/innobase/dict/dict0stats.cc
|
@@ -1473,7 +1473,7 @@ dict_stats_update_transient_for_index(
|
mtr_t mtr;
|
|
mtr.start();
|
- mtr_sx_lock_index(index, &mtr);
|
+ mtr_x_lock_index(index, &mtr);
|
|
dberr_t err;
|
buf_block_t* root = btr_root_block_get(index, RW_SX_LATCH, |
Because a pessimistic insert would start by acquiring an index latch, it would not be able to deadlock with this mini-transaction anymore.
Attachments
Issue Links
- relates to
-
MDEV-29835 Partial server freeze
-
- Closed
-
-
MDEV-29883 Deadlock between InnoDB statistics update and BLOB insert
-
- Closed
-
-
MDEV-29930 Server hang with innodb_file_per_table=0, innodb_undo_tablespaces=0
-
- Open
-
Sorry, it is not that simple. Acquiring a stronger index latch does not help this deadlock. The INSERT operation is actually holding an index latch in a mode that would conflict with the statistics operation, if the statistics operation were holding a latch on the same index.
The statistics can be updated for different indexes in the same tablespace than the one that the INSERT is operating on. To prevent this hang, one possible fix could be that dict_stats_update_transient_for_index() and possibly other functions (dict_index_t::clear(), btr_get_size_and_reserved(), dict_stats_save_defrag_stats(), dict_stats_analyze_index()) would acquire a latch on every index of the table, like row_quiesce_set_state() does it, and only then proceeds to acquire a fil_space_t::latch. This approach would still fail to prevent a hang if a table is stored in the InnoDB system tablespace.