[MDEV-21010] Mariadb hangs, stops responding to new connections Created: 2019-11-08 Updated: 2020-12-17 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Backup |
| Affects Version/s: | 10.3.17 |
| Fix Version/s: | 10.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Richard | Assignee: | Vicențiu Ciorbaru |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | crash | ||
| Environment: |
Debian buster 10.3.17-MariaDB-0+deb10u1 amd64 |
||
| Attachments: |
|
| Description |
|
During my daily mysqldump (called by the Debian automysqlbackup script) a few times per week mysql hangs. The TCP listener and socket stay up but no longer accept new queries. The only resolution is to kill -9 the mysqld process. I was able to do a gdb stack trace in this state, see attachments. The Debian bug report is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=943962 The error.log says:
My query.log has this right before the crash:
|
| Comments |
| Comment by Sergey Vojtovich [ 2019-11-08 ] | ||||||||||||||||||||||||||||||||||||||||
|
The problem seems to be caused by this thread:
For some reason realloc() fails, which may be a bug in MariaDB. But then, unexpectedly, realloc attempts to reporting some error message like "corrupted size vs. prev_size", which is apparently done under some mutex protection. This blocks all mallocs in other threads. Error reporting routine crashes and calls MariaDB SIGSEGV handler, which in turn attempts allocating memory and gets blocked on that mutex. There're 2 problems here: | ||||||||||||||||||||||||||||||||||||||||
| Comment by Richard [ 2019-11-08 ] | ||||||||||||||||||||||||||||||||||||||||
|
I'm not knowledgable enough about MariaDB to comment on problem 1, but I agree that problem 2 is very alarming. In the 15+ years of using mysql/mariadb I have never had the process hang on me. It leaves all my applications dead in the water, several days per week. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-11-13 ] | ||||||||||||||||||||||||||||||||||||||||
|
It's incorrect to call async-unsafe functions from signal handlers. I think that the first warning here describes topic problem https://www.boost.org/doc/libs/1_71_0/doc/html/stacktrace/getting_started.html#stacktrace.getting_started.handle_terminates_aborts_and_seg | ||||||||||||||||||||||||||||||||||||||||
| Comment by Richard [ 2019-11-27 ] | ||||||||||||||||||||||||||||||||||||||||
|
So will the signal handler be changed? The info at the link @kevg posted makes sense to me.. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2019-11-27 ] | ||||||||||||||||||||||||||||||||||||||||
|
According to man 7 signal, fork is listed among async-safe functions. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2019-11-27 ] | ||||||||||||||||||||||||||||||||||||||||
|
well, obviously man is not telling the full truth. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2019-11-27 ] | ||||||||||||||||||||||||||||||||||||||||
|
Interesting, at the same time malloc() is not listed... | ||||||||||||||||||||||||||||||||||||||||
| Comment by Richard [ 2019-12-23 ] | ||||||||||||||||||||||||||||||||||||||||
|
Ok, so changing the SIGSEGV handler to only use async-safe functions is a major change that will not likely be implemented any time soon. How about the cause for the SIGSEGV in the first place, the "realloc failure as such" in the stack trace. Is that solvable? Do you need any more info from me? I have several stack traces from 10.3.18-0+deb10u1 now as well. Do you need those? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2019-12-23 ] | ||||||||||||||||||||||||||||||||||||||||
|
RichieB, I believe for now we have enough info to debug realloc failure. Meanwhile you can disable stack trace dump with stack-trace=off. It should hopefully solve your problem. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-12-23 ] | ||||||||||||||||||||||||||||||||||||||||
|
> Interesting, at the same time malloc() is not listed... It takes a global lock.
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Richard [ 2020-11-22 ] | ||||||||||||||||||||||||||||||||||||||||
|
Until the signal handler is changed to avoid this deadlock please ship Mariadb with a default config of "stack-trace=off". Having your database suddenly hang and stop processing requests is not acceptable in any situation. |