Details
-
Bug
-
Status: Needs Feedback (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.6.24, 10.11.14, 10.11.15, 11.8.5
-
None
-
None
Description
Under heavy load connections, the server crashes due to dict_sys.latch contention exceeding innodb_fatal_semaphore_wait_threshold. The core dump was captured on MariaDB 11.8.5. The same workload was stable on 10.6.18 but crashes on all later versions tested (10.6.24, 10.11.14, 10.11.15, 11.8.5), suggesting a regression.
Error Log (MariaDB 11.8.5)
[Warning] InnoDB: A long wait (xxx seconds) was observed for dict_sys.latch[Warning] InnoDB: A long wait (xxx seconds) was observed for dict_sys.latch
|
[Warning] InnoDB: A long wait (xxx seconds) was observed for dict_sys.latch
|
...
|
(repeated many times)
|
...
|
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.
|
Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
|
[ERROR] mysqld got signal 6 ;
|
Sorry, we probably made a mistake, and this is a bug.
|
Core Dump Analysis
The core dump was captured on MariaDB 11.8.5 with 8445 threads. 190 threads were trying to open the same table via ha_innobase::open_dict_table(). The backtrace shows multiple threads are trying to lock dict_sys.latch, and some are trying to lock the table.
Two code paths were observed:
Read lock (dict0dict.cc:1027 in 11.8.5), 156 threads waiting (1 raised the signal):
These threads call dict_sys.freeze() which acquires a read lock on the latch:
dict/dict0dict.cc#1027:
|
dict_sys.freeze(SRW_LOCK_CALL);
|
|
|
dict_sys_t::freeze:
|
ATTRIBUTE_NOINLINE void dict_sys_t::freeze(const char *file, unsigned line) noexcept |
{
|
latch.rd_lock(file, line);
|
}
|
Representative backtrace (MariaDB 11.8.5):
#0 syscall () from /lib64/libc.so.6
|
#1 srw_mutex_impl<true>::wait (this=<dict_sys>)
|
at storage/innobase/sync/srw_lock.cc:252
|
#2 ssux_lock_impl<true>::rd_lock_nospin (this=<dict_sys>)
|
at storage/innobase/sync/srw_lock.cc:410
|
#3 ssux_lock_impl<false>::rd_lock (this=<optimized out>)
|
at storage/innobase/include/srw_lock.h:362
|
#4 srw_lock_impl<false>::psi_rd_lock (this=<dict_sys>)
|
at storage/innobase/sync/srw_lock.cc:489
|
#5 dict_table_open_on_name (table_name="<schema>/<table>",
|
dict_locked=<optimized out>, ignore_err=DICT_ERR_IGNORE_FK_NOKEY)
|
at storage/innobase/dict/dict0dict.cc:1027
|
#6 ha_innobase::open_dict_table (ignore_err=DICT_ERR_IGNORE_FK_NOKEY,
|
is_partition=<optimized out>, norm_name="<schema>/<table>")
|
at storage/innobase/handler/ha_innodb.cc:6109
|
Write lock (dict0dict.cc:1057 in 11.8.5), 33 threads:
These threads call dict_sys.lock() which acquires a write lock on the latch:
#0 syscall () from /lib64/libc.so.6
|
#1 srw_mutex_impl<false>::wait (this=<dict_sys>)
|
at storage/innobase/sync/srw_lock.cc:252
|
#2 srw_mutex_impl<false>::wait_and_lock (this=<dict_sys>)
|
at storage/innobase/sync/srw_lock.cc:313
|
#3 srw_mutex_impl<false>::wr_lock (this=<dict_sys>)
|
at storage/innobase/include/srw_lock.h:162
|
#4 ssux_lock_impl<false>::wr_lock (this=<dict_sys>)
|
at storage/innobase/include/srw_lock.h:284
|
#5 srw_lock_impl<false>::psi_wr_lock (this=<dict_sys>)
|
at storage/innobase/sync/srw_lock.cc:519
|
#6 dict_table_open_on_name (table_name=<optimized out>,
|
dict_locked=<optimized out>, ignore_err=DICT_ERR_IGNORE_FK_NOKEY)
|
at storage/innobase/dict/dict0dict.cc:1057
|
#7 ha_innobase::open_dict_table (ignore_err=DICT_ERR_IGNORE_FK_NOKEY,
|
is_partition=<optimized out>, norm_name="<schema>/<table>")
|
at storage/innobase/handler/ha_innodb.cc:6109
|
Attachments
Issue Links
- relates to
-
MDEV-33594 Invoking log_free_check() while holding exclusive dictionary latch may block most InnoDB threads for a long time
-
- Confirmed
-
-
MDEV-35154 dict_sys_t::load_table() is holding exclusive dict_sys.latch for unnecessarily long time
-
- Confirmed
-
-
MDEV-35424 False alarm/crash: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch
-
- Closed
-