[MDEV-29843] Server hang in thd_decrement_pending_ops/pthread_cond_signal Created: 2022-10-21 Updated: 2023-06-08 Resolved: 2022-10-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.8, 10.6.10 |
| Fix Version/s: | 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Nico Jehle | Assignee: | Vladislav Vaintroub |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7 |
||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
Since upgrading to MariaDB 10.6 our systems randomly abort due to InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_ sys.latch. We could not find steps to reproduce it, the aborts seem to happen randomly, sometimes after a few days of uptime, sometimes after a few hours. error log:
gdb backtrace:
|
| Comments |
| Comment by Marko Mäkelä [ 2022-10-21 ] | |||||||||||||||||||||||
|
njehle, thank you for producing a stack trace with GDB. However, that crashing thread is only the messenger. Can you please post the output of thread all apply backtrace so that I can see which threads are actually involved in the deadlock? I may need some further information for a particular crash, so please save the core dump and be prepared to run more GDB commands. Some special steps may be necessary to counter | |||||||||||||||||||||||
| Comment by Nico Jehle [ 2022-10-21 ] | |||||||||||||||||||||||
|
MDEV-29843.full_backtrace.log I've attached a full backtrace of the same core dump, created by following the steps at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#getting-backtraces-with-gdb-on-linux
The core dump we've moved to a separate system with the debuginfo packages so we can run further commands. If there is a need to adjust config to include the more data in the coredump, please let me know. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-21 ] | |||||||||||||||||||||||
|
For the record, this does not look at all related to On a quick look, very many threads are waiting for dict_sys.latch. I do not think that we explicitly keep track of the writer thread ID in that type of lock, so it will take time to figure out which thread is holding it exclusively, and what that thread is doing. By accident, I may have found the dict_sys.latch owner thread: Thread 13 is waiting inside log_write_up_to(), for an update of InnoDB persistent statistics to become durable. To me, it does not look like being blocked at the moment. wlad wrote this group_commit_lock::release(); I hope that he can check if anything could get stuck there. It could help if you could collect multiple sets of stack traces when the server appears to be hung. Does the storage hardware work flawlessly? Have you seen any log messages (including Linux dmesg) about trouble? Does smartctl report anything of interest? | |||||||||||||||||||||||
| Comment by Nico Jehle [ 2022-10-21 ] | |||||||||||||||||||||||
Forgot to mention that in the initial description; it's not a single server: we have ~80 host with each ~10 instances of MariaDB, there does not seem to be a pattern on which server or DB instance the issue occurs. So far it only affected single instances on different servers. And nothing unusual in the server logs.
The core dump in this ticket is from a system that was rebooted to activate making core dumps. For the past crashes we unfortunately don't have core dumps, as we enabled them just this week, | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-21 ] | |||||||||||||||||||||||
|
What are your configuration parameters, such as innodb_buffer_pool_size, innodb_log_file_size, innodb_flush_log_at_trx_commit? On what file system and type of storage is the ib_logfile0 located? | |||||||||||||||||||||||
| Comment by Nico Jehle [ 2022-10-22 ] | |||||||||||||||||||||||
|
In the config there are only those for innodb:
We're only passing that configuratio to MariaDB via ExecStart=/usr/sbin/mysqld --defaults-file=/srv/mysql/%i/my.cnf in the systemd service, so for the rest it's the defaults. Storage is XFS on SATA SSDs; 2x SSDSC2KG960G8R as RAID1 on an PERC H330 Mini. | |||||||||||||||||||||||
| Comment by Nico Jehle [ 2022-10-24 ] | |||||||||||||||||||||||
|
Happened on another host and instance (same config, just different data), attaching the backtrace of that, a bit smaller this time. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-24 ] | |||||||||||||||||||||||
|
I see that you have a very small redo log, only 6.4% of the buffer pool size. That may cause frequent log checkpointing and excessive page writing. Also in MDEV-29843.full_backtrace.host_b.2022-10-24.log It feels very strange that pthread_cond_signal() appears to be waiting for something. I checked the source code of GNU libc 2.35 (hopefully not too different from the one that you are using), and found the following in glibc-2.35/nptl/pthread_cond_signal.c:
The _condvar_acquire_lock() in nptl/pthread_cond_common.c includes a spinloop and a futex call. It might be that your GNU libc version included a direct call to _lll_lock_wait() here. In the core dump, can you try to dump the arguments of the following:
That is:
| |||||||||||||||||||||||
| Comment by Nico Jehle [ 2022-10-24 ] | |||||||||||||||||||||||
|
Attaching log with arguments. glibc on CentOS 7 is kind of old, it's based on 2.17: https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_cond_signal.c;h=063dcbc32a25d0e36ba04de01581530424c47912;hb=c758a686 | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-24 ] | |||||||||||||||||||||||
|
I think that wlad knows this part of our code best. | |||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2022-10-24 ] | |||||||||||||||||||||||
|
Even when looking most attentively , I do not see how the tiny condition variable in Though I have admittedly no knowledge of libc or Linux (marko is our subject expert) I tend to blame replication code, because first THD::async_state is in the middle of replication variables, and also the code that executes on every innodb update, would now suddenly hang on replication only. If this is any help, I could try to move the variable outside of the replication area in THD, and then something else would be affected by alleged (by myself) memory overwrite. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-25 ] | |||||||||||||||||||||||
|
wlad, I think that your hypothesis is plausible. Please implement that change. I see that you enhanced the data structure in two phases: first njehle, are you using replication? XA transactions? temporary tables? Galera? | |||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2022-10-25 ] | |||||||||||||||||||||||
|
The code is in bb-10.6- | |||||||||||||||||||||||
| Comment by Nico Jehle [ 2022-10-25 ] | |||||||||||||||||||||||
|
marko: | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-25 ] | |||||||||||||||||||||||
|
wlad, moving the field to an earlier position in THD would have been a work-around. It does not look like your other changes would be a real fix; they could only reduce the probability of this hang. I am not familiar with that code, but the changes look reasonable to me, other than misspelling the function name log_write_up_to. We have a similar example in MDEV-25611 where a rather frequent hang was “fixed” (in reality, the probability of the hang was drastically reduced) by I guess that we will need some feedback from njehle to see if this really worked. Unfortunately, most users who report hangs would never produce stack traces of all threads, so we can’t know how common this hang is or was. | |||||||||||||||||||||||
| Comment by Nico Jehle [ 2022-10-25 ] | |||||||||||||||||||||||
I've rebooted all our server last week, so far only 5 MariaDB instanced were aborted due to hangs. I'll try to reproduce the issue in our QA with MariaDB 10.6.10, by replicating the queries from Live. Depending on how that goes, I'll look into other options here. Are there automatic builds/packages for the change from this ticket's git branch? If there are none for CentOS 7, I'll have to look into building those as well. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-25 ] | |||||||||||||||||||||||
|
njehle, packages for the latest development snapshots are available at addresses like http://hasky.askmonty.org/archive/10.6/. That could be an option once this fix has been pushed to the main 10.6 branch. The 10.6.11 release should be out soon (within a week, I hope), and you might simply upgrade to 10.6.11 once it becomes available. This work-around should still be in time for that release. | |||||||||||||||||||||||
| Comment by Nico Jehle [ 2022-10-26 ] | |||||||||||||||||||||||
|
marko Thanks for the info, the fix has been pushed to the 10.6 branch already. I'll let you know whether we still run into this, something like 1 or 2 weeks after it has been released. Thanks for the support, | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-11-09 ] | |||||||||||||||||||||||
|
I see that this hang involves a field wrefs inside pthread_cond_t, just like in Because I am not familiar with this part of code, I can’t say if any misuse of pthread_cond_t is possible here, but it could be more plausible than the alternative explanation that the memory was corrupted by some write through a wild pointer (without corrupting anything else). | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-11-09 ] | |||||||||||||||||||||||
|
In MDEV-29843.host_b.2022-10-24.arguments.log
The m_mtx seems to be owned by Thread 33 (Thread 0x7fc422ffd700 (LWP 31492)) of MDEV-29843.full_backtrace.host_b.2022-10-24.log | |||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2023-06-08 ] | |||||||||||||||||||||||
|
The fix for this is to not use the async completion mechanism for replication worker threads, and I think that makes sense. I'm thinking the problem could be that the replication worker threads re-use the THD for each transaction? I assume the async completion needs each pending commit completion / OK packet to have its own THD? And it doesn't make much sense to have such a mechanism, as then the async completion becomes kinda pointless - so the fix seems correct. |