[MDEV-23186] mysqld doesn't create core dump if crashing while backtracing or dumping memory - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Not a Bug
Affects Version/s: 10.4.13
Fix Version/s: N/A
Component/s: Server
Labels:
None

Description

When a 2nd crash happens in the crash handling routine, the core dump function doesn't get called. As a result, no core file can be obtained and crash can't be analyzed.

Let's create a config file or command line option to go straight to core dumping in case of crash.

Attachments

Issue Links

relates to

MDEV-14229 Stack trace is not resolved for shared objects

Closed

MDEV-23101 SIGSEGV in lock_rec_unlock() when Galera is enabled

Closed

Activity

Ascending order - Click to sort in descending order

Daniel Black added a comment - 2020-07-16 03:36

To skip the stack trace you can use --skip-stack-trace on command line or as config option.

https://mariadb.com/kb/en/mysqld-options/#-stack-trace

Daniel Black added a comment - 2020-07-16 03:36 To skip the stack trace you can use --skip-stack-trace on command line or as config option. https://mariadb.com/kb/en/mysqld-options/#-stack-trace

Rick Pizzi (Inactive) added a comment - 2020-07-16 06:27

danblack will this also skip memory dump?

Rick Pizzi (Inactive) added a comment - 2020-07-16 06:27 danblack will this also skip memory dump?

Rick Pizzi (Inactive) added a comment - 2020-07-16 07:09 - edited

Perusing the source code, I see that in case of a double segfault, core dump is skipped by default, regardless the value of that command line option. This explains why we aren't getting a core file....

extern "C" sig_handler handle_fatal_signal(int sig)

  time_t curr_time;

  struct tm tm;

#ifdef HAVE_STACKTRACE

  THD *thd;

/*

     This flag remembers if the query pointer was found invalid.

     We will try and print the query at the end of the signal handler, in case

     we're wrong.

*/

  bool print_invalid_query_pointer= false;

#endif

  if (segfaulted)

    my_safe_printf_stderr("Fatal " SIGNAL_FMT " while backtracing\n", sig);

    goto end;

  segfaulted = 1;

[  ... ]

#ifdef HAVE_WRITE_CORE

  if (test_flags & TEST_CORE_ON_SIGNAL)

    my_write_core(sig);

#endif

end:

#ifndef __WIN__

/*

     Quit, without running destructors (etc.)

     Use a signal, because the parent (systemd) can check that with WIFSIGNALED

     On Windows, do not terminate, but pass control to exception filter.

*/

  signal(sig, SIG_DFL);

  kill(getpid(), sig);

Rick Pizzi (Inactive) added a comment - 2020-07-16 07:09 - edited Perusing the source code, I see that in case of a double segfault, core dump is skipped by default, regardless the value of that command line option. This explains why we aren't getting a core file.... extern "C" sig_handler handle_fatal_signal(int sig) { time_t curr_time; struct tm tm; #ifdef HAVE_STACKTRACE THD *thd; /* This flag remembers if the query pointer was found invalid. We will try and print the query at the end of the signal handler, in case we're wrong. */ bool print_invalid_query_pointer= false; #endif if (segfaulted) { my_safe_printf_stderr("Fatal " SIGNAL_FMT " while backtracing\n", sig); goto end; } segfaulted = 1; [ ... ] #ifdef HAVE_WRITE_CORE if (test_flags & TEST_CORE_ON_SIGNAL) { my_write_core(sig); } #endif end: #ifndef __WIN__ /* Quit, without running destructors (etc.) Use a signal, because the parent (systemd) can check that with WIFSIGNALED On Windows, do not terminate, but pass control to exception filter. */ signal(sig, SIG_DFL); kill(getpid(), sig);

Rick Pizzi (Inactive) added a comment - 2020-07-16 07:18

Following the code, it appears that the crash is indeed inside my_print_backtrace() and the actual memory dump is from the system library and not from MariaDB. We will try to run with the option you have mentioned. Thanks.

Rick Pizzi (Inactive) added a comment - 2020-07-16 07:18 Following the code, it appears that the crash is indeed inside my_print_backtrace() and the actual memory dump is from the system library and not from MariaDB. We will try to run with the option you have mentioned. Thanks.

Daniel Black added a comment - 2020-07-16 07:32

You're welcome. It did look like your other MDEV crashed in my_print_backtrace. It doesn't quite eliminate the entire signal handler, however as you've seen its quite minimal with ---skip-stack-trace enabled.. Best wishes resolving the initial crash.

Daniel Black added a comment - 2020-07-16 07:32 You're welcome. It did look like your other MDEV crashed in my_print_backtrace. It doesn't quite eliminate the entire signal handler, however as you've seen its quite minimal with ---skip-stack-trace enabled. . Best wishes resolving the initial crash.

Marko Mäkelä added a comment - 2021-03-19 07:05

I suspect that ~~MDEV-14229~~ increased the probability of this happening, due to invoking more code in the signal handler.

Marko Mäkelä added a comment - 2021-03-19 07:05 I suspect that MDEV-14229 increased the probability of this happening, due to invoking more code in the signal handler.

People

Assignee:: Unassigned

Reporter:: Rick Pizzi (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2020-07-15 20:05

Updated:: 2024-07-07 21:58

Resolved:: 2020-07-16 07:18

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server