[MDEV-23186] mysqld doesn't create core dump if crashing while backtracing or dumping memory Created: 2020-07-15  Updated: 2021-03-31  Resolved: 2020-07-16

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.4.13
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Rick Pizzi Assignee: Unassigned
Resolution: Not a Bug Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-14229 Stack trace is not resolved for share... Closed
relates to MDEV-23101 SIGSEGV in lock_rec_unlock() when Gal... Closed

 Description   

When a 2nd crash happens in the crash handling routine, the core dump function doesn't get called. As a result, no core file can be obtained and crash can't be analyzed.

Let's create a config file or command line option to go straight to core dumping in case of crash.



 Comments   
Comment by Daniel Black [ 2020-07-16 ]

To skip the stack trace you can use --skip-stack-trace on command line or as config option.

https://mariadb.com/kb/en/mysqld-options/#-stack-trace

Comment by Rick Pizzi [ 2020-07-16 ]

danblack will this also skip memory dump?

Comment by Rick Pizzi [ 2020-07-16 ]

Perusing the source code, I see that in case of a double segfault, core dump is skipped by default, regardless the value of that command line option. This explains why we aren't getting a core file....

extern "C" sig_handler handle_fatal_signal(int sig)
{
  time_t curr_time;
  struct tm tm;
 
#ifdef HAVE_STACKTRACE
  THD *thd;
  /*
     This flag remembers if the query pointer was found invalid.
     We will try and print the query at the end of the signal handler, in case
     we're wrong.
  */
  bool print_invalid_query_pointer= false;
#endif
 
  if (segfaulted)
  {
    my_safe_printf_stderr("Fatal " SIGNAL_FMT " while backtracing\n", sig);
    goto end;
  }
 
  segfaulted = 1;
 
[  ... ]
 
#ifdef HAVE_WRITE_CORE
  if (test_flags & TEST_CORE_ON_SIGNAL)
  {
    my_write_core(sig);
  }
#endif
 
end:
#ifndef __WIN__
  /*
     Quit, without running destructors (etc.)
     Use a signal, because the parent (systemd) can check that with WIFSIGNALED
     On Windows, do not terminate, but pass control to exception filter.
  */
  signal(sig, SIG_DFL);
  kill(getpid(), sig);

Comment by Rick Pizzi [ 2020-07-16 ]

Following the code, it appears that the crash is indeed inside my_print_backtrace() and the actual memory dump is from the system library and not from MariaDB. We will try to run with the option you have mentioned. Thanks.

Comment by Daniel Black [ 2020-07-16 ]

You're welcome. It did look like your other MDEV crashed in my_print_backtrace. It doesn't quite eliminate the entire signal handler, however as you've seen its quite minimal with ---skip-stack-trace enabled.. Best wishes resolving the initial crash.

Comment by Marko Mäkelä [ 2021-03-19 ]

I suspect that MDEV-14229 increased the probability of this happening, due to invoking more code in the signal handler.

Generated at Thu Feb 08 09:20:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.