[MDEV-8004] key_buffer related crashes in MyISAM table check, stacktrace in error log truncated Created: 2015-04-16 Updated: 2015-12-01 Resolved: 2015-12-01 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - Aria |
| Affects Version/s: | 10.0.16 |
| Fix Version/s: | 10.0.21, 10.1.7 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Hartmut Holzgraefe | Assignee: | Michael Widenius |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Environment: |
CentOS 6.5 - 64bit |
||
| Issue Links: |
|
||||||||
| Description |
|
There have been seveal cases now where MariaDB 10.0.16 crashed with a strack trace that only shows signal/crash handling stack frames, starting with pthread_cond_signal() call, e.g.:
So there is no information whatsoever that would hint towards what acutally lead up to this crash (and nothing suspicious in the error log leading up to the crash either). The only suspicious pattern that I could identify so far is that the Connection ID is usually a very low one (so far seen: 5 twice, 7 once, and one very high number in the 60 000 range ...) so things may be related to replication threads). Any idea how to get a more useful stack trace, preferrably without having to enable core dumps (as this happens on systems with huge memory / process size)? |
| Comments |
| Comment by Sergei Golubchik [ 2015-06-01 ] | |||||||||||||||||||||||||||||
|
To answer the last question: you can to get stack traces with gdb. It doesn't require a core dump. Run something like
It will attach to the running mysqld, let it continue running until it crashes, collect stack traces for all threads, and let it exit. That's tested and works (but not on mysqld, on a small a.out — multithreaded with a sigsegv handler). I've no idea what gdb does to mysqld performance. | |||||||||||||||||||||||||||||
| Comment by Simon J Mudd [ 2015-06-11 ] | |||||||||||||||||||||||||||||
|
Sergei, I missed this completely. You're missing the point. the point is not to get a stack trace of a running server, the point is to get a stack trace of the thread when mysqld crashes. Current 10.0.16 (not tried later versions) do not provide a complete strack trace of the activity of the thread which generates an exception. So the context / cause of the crash is not visible to the user or to pass back to the developers so they can track down why it happened. This works in MySQL and MariaDB 5.5 but from what I can see it seems to have stopped working in 10.0. (or at least in 10.0.1X this seems to be the case). See above output. If something goes wrong with mysqld and it crashes this information is invaluable. Given how complex mysqld is having it crash with no clue as to the cause is a problem for the user of the software as it's going to make it much harder to get the problem fixed. So please if you can ensure that when mysqld crashes the full stack trace of the "broken" thread is shown. I hope this clarifies the situation. | |||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-11 ] | |||||||||||||||||||||||||||||
|
simon.mudd, but that's precisely what I suggested. That command is to get a stack trace when mysqld crashes. | |||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2015-06-18 ] | |||||||||||||||||||||||||||||
|
Instead of
it has to be
for exact pgrep match, otherwise it's likely to pick up the mysqld_safe scripts process instead of the actual mysqld as the wrapper script is likely to have a lower PID than the server process. Now I can indeed see a gdb stack trace when forcing a "kill -11" on mysqld. I also did some simple performance tests with sysbench on my home machine and couldn't seen any significant difference between runs with or without gdb attached. So this gdb trick looks like an interesting approach indeed. | |||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2015-06-18 ] | |||||||||||||||||||||||||||||
In general it does, I get a stack trace in the error log just fine when forcing one with "killall -11 mysqld", with both self compiled binaries and MariaDB RPMs. I also tried to forcefully overwrite the stack with zeros and then force a crash by a NULL pointer assignment in a patched mysqld, but I either got the full stack trace from that, or traces that contained at least one mangled all-zero symbol entry after the "pthread_cond_signal" line from that .... So it looks as if there's a bug in 10.0.16 (or 10.0.x in general) that mangles the stack in a very interesting way that makes it look valid to the stack trace resolver in a way as if pthread_cond_signal() was indeed the bottom of a call stack ...? | |||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2015-06-18 ] | |||||||||||||||||||||||||||||
|
PS: maybe it should be "thread all apply bt full" right away to get more detailed gdb output? Or maybe a combination of
for both the simple and full format (untested, but should work?) | |||||||||||||||||||||||||||||
| Comment by Simon J Mudd [ 2015-06-19 ] | |||||||||||||||||||||||||||||
|
I think you're missing something rather important. The request is to ensure that the stack trace provided by MariaDB when it crashes after being started up normally works. | |||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-19 ] | |||||||||||||||||||||||||||||
|
Okay, I see. The complain here is not that MariaDB crashes, but that there is no stack trace. I'll reopen it. | |||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-19 ] | |||||||||||||||||||||||||||||
|
May be the problem here is not the stack tracing code itself, but that this bug that corrupts the stack in this specific way is only possible in 10.0 (meaning, 5.5 and mysql 5.6, 5.7 would've also printed no stack trace for such a stack corruption). If this is the case, the real stack trace (with gdb, as above) could help us to locate the bug. | |||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-19 ] | |||||||||||||||||||||||||||||
|
MDEV-8325 — another issue with a crash without a stack trace. | |||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2015-07-24 ] | |||||||||||||||||||||||||||||
|
Two crashes with same stack trace have now been reported that have happened after MariaDB 10.0.19 was restarted after a hard kill Core dumps were available and unmangled stack traces could be extracted These two crashes happened during MyISAM table repair (triggered by auto repair on open feature) Last function call before pthread_cond_signal() is simple_key_cache_read().
| |||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2015-07-24 ] | |||||||||||||||||||||||||||||
|
It's also weired that gdb could produce a stack trace just fine while the internal backtrace printing code couldn't? | |||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2015-12-01 ] | |||||||||||||||||||||||||||||
|
The bug was an overflow when allocating an Aria keycache of more than 45G |