james.coleman-ah, thank you. Unfortunately, the 2 print commands were rejected, because no debugging symbols were available. Often they are part of a separate package, such as MariaDB-server-debuginfo.
My guess that it could be one of the InnoDB data dictionary latches seems to be right, but because those print commands were rejected, we cannot know for sure. In dump.txt
we see several threads waiting for dict_operation_lock S-latch, mostly on FOREIGN KEY check, but also at least once on rollback (thread 130).
Among the 170 threads, it is not trivial to see which thread is actually holding (or waiting for) the dict_operation_lock X-latch and thus blocking all the S-latch requests. One X-latch request is pending (not already granted). It could be blocked by a thread that is already holding the X-latch or an S-latch. Subsequent X-latch requests should be blocked by this pending X-latch request (if they were not already blocked by a X-latch held by some thread).
Thread 154 (Thread 0x7fd5823fd700 (LWP 7368)):
|
#0 0x00007fdc309309f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
|
#1 0x0000560a16fb5ebb in os_event_wait_low(os_event*, long) ()
|
#2 0x0000560a17030263 in sync_array_wait_event(sync_array_t*, unsigned long) ()
|
#3 0x0000560a17032e27 in rw_lock_x_lock_func(rw_lock_t*, unsigned long, char const*, unsigned long, bool, bool) ()
|
#4 0x0000560a170ed396 in dict_stats_save(dict_table_t*, unsigned long const*) [clone .part.73] ()
|
#5 0x0000560a170edb98 in dict_stats_update(dict_table_t*, dict_stats_upd_option_t) ()
|
#6 0x0000560a170effa0 in dict_stats_thread ()
|
#7 0x00007fdc3092ce65 in start_thread () from /lib64/libpthread.so.0
|
#8 0x00007fdc2ea9988d in clone () from /lib64/libc.so.6
|
I think that valerii has mentioned that setting STATS_AUTO_RECALC=1 STATS_PERSISTENT=1 for InnoDB tables can cause problems. I failed to find an open bug report for that one, though.
Given that you seem to have lots of concurrent operations that involve FOREIGN KEY constraints, MDEV-16467 might indeed have something to do with this.
Without having more information available, it is very difficult to diagnose this. Next time the hang occurs, could you please generate a core dump? Then, with the appropriate debugging symbols installed, we should be able to extract more data from it.
We are still examining this issue to this day. As an update, it seems to only popup when MariaDB is under heavy load. Rather it be lots of cron jobs running heavy queries at the same time, or lots of traffic on a site.