[MDEV-8540] Crash on server shutdown since 10.0.16 Created: 2015-07-24  Updated: 2018-07-02  Resolved: 2018-06-27

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.0
Fix Version/s: 5.5.61, 10.0.36, 10.1.35, 10.2.17, 10.3.8

Type: Bug Priority: Minor
Reporter: Hoang Xuan Phu Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 0
Labels: None

Attachments: File my.cnf    

 Description   

This problem was first found in 10.0.20, and I bisected it back to 10.0.16. Every time the server is shut down, it crashes. I am able to reproduce this with an empty data directory.

150724 13:08:00 [Note] /nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld: Normal shutdown
 
150724 13:08:00 [Note] Event Scheduler: Purging the queue. 0 events
150724 13:08:00 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.0.16-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=1026
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2384868 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
/nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld(my_print_stacktrace+0x29)[0xb75919]
/nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld(handle_fatal_signal+0x398)[0x721868]
/nix/store/la5imi1602jxhpds9675n2n2d0683lbq-glibc-2.20/lib/libpthread.so.0(+0x10020)[0x7f8852c59020]
/nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld(_ZN13MYSQL_BIN_LOG5closeEj+0x2cf)[0x7da30f]
/nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld(_ZN13MYSQL_BIN_LOG7cleanupEv+0x52)[0x7da422]
/nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld[0x53402d]
/nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld(_Z10unireg_endv+0x2d)[0x535fed]
/nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld[0x538ab9]
/nix/store/w2f13ydy5nyp53vsi17k6y0igkw99d7l-mariadb-10.0.16/bin/mysqld(kill_server_thread+0xe)[0x538b8e]
/nix/store/la5imi1602jxhpds9675n2n2d0683lbq-glibc-2.20/lib/libpthread.so.0(+0x8374)[0x7f8852c51374]
/nix/store/la5imi1602jxhpds9675n2n2d0683lbq-glibc-2.20/lib/libc.so.6(clone+0x6d)[0x7f88514c9f9d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

I attached the config file in use. MariaDB is built from source. The OS is NixOS.



 Comments   
Comment by Elena Stepanova [ 2015-07-25 ]

phunehehe,

Please specify your environment and the packages you use. Thanks.

Comment by Hoang Xuan Phu [ 2015-07-25 ]

elenst, I updated with the information. Thanks for looking.

Comment by Elena Stepanova [ 2015-07-30 ]

Hi,

If you are building from source anyway, could you maybe build a debug server and try to reproduce the crash there? (And attach the error log with the debug stack trace).

I don't have NixOS handy, and I'm not getting the crash on Wheezy.

Comment by Hoang Xuan Phu [ 2015-07-30 ]

elenst, I intended to do that but got carried away. Will try to get that. Is this the right way to do it? https://mariadb.com/kb/en/mariadb/compiling-mariadb-for-debugging/

Comment by Elena Stepanova [ 2015-07-30 ]

phunehehe, yes, it's fine. Of course, you need to look at the "MariaDB starting with 5.5" section, the one with cmake. Thanks.

Comment by Hoang Xuan Phu [ 2015-08-06 ]

elenst: the situation is a bit more complicated than I first thought. The crash doesn't occur when mariadb is started in a standalone process and stopped with kill. We have a systemd service that launches an angel process that in turn launches mariadb (yes, we use a supervisor to supervise another supervisor ). The crash happens when mariadb is killed when running under this hierarchy. I sent mariadb a TERM signal to test this, so it's probably not some bad killing by systemd and angel, as I have seen before.

Here is the crash log when running a debug build.

150806  8:12:50 [Note] /nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld: Normal shutdown
 
150806  8:12:50 [Note] Event Scheduler: Purging the queue. 0 events
150806  8:12:50 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.0.16-MariaDB-debug-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=2
max_threads=1026
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2385309 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
Warning: 2048 bytes lost at 0x7fbd3c04f730, allocated by T@0 at 0xdeb80c, 0xdf0f3d, 0xe1c12c, 0xe1c55d, 0xe1866b, 0xe192c4, 0x593588, 0x593d7d
... a lot more bytes lost ...
Warning:  512 bytes lost at 0x3b184a0, allocated by T@0 at 0xdeb80c, 0xdf0f3d, 0x7c2a59, 0x75d881, 0x75d8b6, 0xe9fd4d, 0x7fbdb36e3ba5, 0x590c49
Memory lost: 546866944 bytes in 1944 chunks
/nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld(my_print_stacktrace+0x35)[0xe17931]
/nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld(handle_fatal_signal+0x33a)[0x83afe2]
/nix/store/la5imi1602jxhpds9675n2n2d0683lbq-glibc-2.20/lib/libpthread.so.0(+0x10020)[0x7fbdb4f36020]
/nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld(_ZN13MYSQL_BIN_LOG5closeEj+0x36b)[0x916e09]
/nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld(_ZN13MYSQL_BIN_LOG7cleanupEv+0x111)[0x90be2b]
/nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld[0x59405f]
/nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld(_Z10unireg_endv+0xe)[0x593e76]
/nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld[0x593d98]
/nix/store/d7ay6fpavkyk4blghvsyj35yis5jr6x6-mariadb-10.0.16/bin/mysqld(kill_server_thread+0x1b)[0x593db5]
/nix/store/la5imi1602jxhpds9675n2n2d0683lbq-glibc-2.20/lib/libpthread.so.0(+0x8374)[0x7fbdb4f2e374]
/nix/store/la5imi1602jxhpds9675n2n2d0683lbq-glibc-2.20/lib/libc.so.6(clone+0x6d)[0x7fbdb37a6f9d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

I'm not sure if the "bytes lost" makes any sense. Strangely, the trace doesn't look much different from previously. I'm pretty sure it's a debug build though.

2015-08-06 08:40:13 7ff5e3a9c740 InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2015-08-06 08:40:13 7ff5e3a9c740 InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!

Comment by Elena Stepanova [ 2015-08-06 ]

phunehehe, is there any specifics in how the angel process starts mysqld? Can you quote the command line as it's seen in the ps -ef | grep mysqld output?
Did you try to run it with the same exact command line and see if you can get the crash this way without all the supervisors?

Comment by Hoang Xuan Phu [ 2015-08-07 ]

elenst, here is the process as shown by ps -ef:

mysql     1429   610 11 07:07 ?        00:00:05 /nix/store/fy4jrzks49da0mwn5fyd4firvpli3y6x-mariadb-10.0.16/bin/mysqld --defaults-file=/nix/store/csvxzqqmrlkm5v34h2s3rlzp4v9z4rcx-my.cnf

I did run mariadb with the exact same command, but that way it's not crashing. I'm not really sure how else to replicate.

Comment by Elena Stepanova [ 2015-08-07 ]

phunehehe, did you run it as mysql user?

Comment by Hoang Xuan Phu [ 2015-08-07 ]

elenst: yes I did, using sudo. To be specific

sudo -u mysql /nix/store/fy4jrzks49da0mwn5fyd4firvpli3y6x-mariadb-10.0.16/bin/mysqld --defaults-file=/nix/store/csvxzqqmrlkm5v34h2s3rlzp4v9z4rcx-my.cnf

Comment by Elena Stepanova [ 2015-09-05 ]

I don't have a NixOS machine with development environment, and even if I did, it's very unlikely everything will be installed and configured exactly the same way as on yours, which might be important in order to reproduce the problem.

Is it your production system? If it's a test server, would you be able to provide us a temporary access to it, so we could investigate it at the spot?

Comment by Hoang Xuan Phu [ 2015-09-08 ]

elenst I can replicate this in a development system. It should be possible to deploy to a temporary system where you can investigate. Can I have your public SSH key?

Comment by Elena Stepanova [ 2015-09-08 ]

phunehehe,

Here is my public key: https://launchpad.net/~elenst/+sshkeys

Comment by Hoang Xuan Phu [ 2015-09-10 ]

(I'm preparing the temporary system, but it's taking some time because I want to place it in an isolated network)

Comment by Hoang Xuan Phu [ 2015-09-17 ]

elenst, I have sent an email to the address in your profile with details of the system. Thanks.

Comment by Elena Stepanova [ 2015-09-21 ]

phunehehe, thanks a lot for providing the environment, I got what I needed, you can remove my keys now.

The problem is only remotely related to the angel process or other nested watchdogs/wrappers. One is important though. Among other things, it does

if [ ! -t 1 ]; then
  exec 0>&-
...
fi

That is, it closes the input.
On some reason, together with binary log enabled, it causes the crash.

Note: In fact, there are at least three problems caused by this stdin redirection: if binary log is not enabled, either InnoDB or Aria throws errors about bad file descriptors. Aria does it on shutdown, while InnoDB might do it on startup and fail to start. Because of that, I am not sure how important it is to fix the issue with binary logging only, although it would be nice if it did something more elegant than crash.

To reproduce, start the server as

# Assuming BASEDIR is already set to a correct value
0>&- $BASEDIR/sql/mysqld --no-defaults --basedir=$BASEDIR --datadir=$BASEDIR/data  --loose-lc-messages-dir=$BASEDIR/sql/share --log-bin

wait until it starts and then shut it down.

#3  <signal handler called>
#4  0x000000000094ae73 in MYSQL_BIN_LOG::close (this=0x18a2bc0, exiting=5) at 10.0/sql/log.cc:7843
#5  0x000000000093f95f in MYSQL_BIN_LOG::cleanup (this=0x18a2bc0) at 10.0/sql/log.cc:3100
#6  0x00000000005bf4aa in clean_up (print_message=true) at 10.0/sql/mysqld.cc:2007
#7  0x00000000005bf2c2 in unireg_end () at 10.0/sql/mysqld.cc:1936
#8  0x00000000005bf1e4 in kill_server (sig_ptr=0x0) at 10.0/sql/mysqld.cc:1864
#9  0x00000000005bf201 in kill_server_thread (arg=0x7f30b5e56e78) at 10.0/sql/mysqld.cc:1887
#10 0x00007f30b5a90b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#11 0x00007f30b3d4695d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Could only reproduce the crash on 10.0.

Comment by Hoang Xuan Phu [ 2015-09-21 ]

elenst that was fantastic!

Comment by Sergey Vojtovich [ 2018-06-27 ]

There're two problems here.

Crash itself happens because we're trying to access current_thd->variables.lc_messages->errmsgs->errmsgs, whereas current_thd is NULL, when we attempt to report an error. This affects 5.5+.

But the fact that we come across this error is another bug. Since 10.0 we do fclose(stdin) after binlog index file has been opened. If we were started with stdin closed, binlog index file gets 0 fd and then it gets closed. On shutdown we attempt to close binlog index file again and second close fails.

Comment by Sergey Vojtovich [ 2018-06-27 ]

In 10.1 things are got even more complicated with the following patch:

commit 9f00950d2ad9be5533e66a5fa87a795a9574585f
Author: Nirbhay Choubey <nirbhay@mariadb.com>
Date:   Wed Jun 24 23:25:22 2015 -0400
 
    MDEV-7631 : Invalid WSREP_SST rows appear in mysqld-bin.index file
 
    Try not to release 0 (STDIN_FILENO) file descriptor as it can
    incorrectly get reused by streams opened later.
 
diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index c297367..a2fe4ad 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -6278,7 +6278,14 @@ int mysqld_main(int argc, char **argv)
                          (char*) "" : mysqld_unix_port),
                          mysqld_port,
                          MYSQL_COMPILATION_COMMENT);
-  fclose(stdin);
+
+  // try to keep fd=0 busy
+  if (!freopen(IF_WIN("NUL","/dev/null"), "r", stdin))
+  {
+    // fall back on failure
+    fclose(stdin);
+  }
+
 #if defined(_WIN32) && !defined(EMBEDDED_LIBRARY)
   Service.SetRunning();
 #endif

Looks like an incorrect attempt to workaround this problem.

Comment by Sergey Vojtovich [ 2018-06-27 ]

sql_print_error() bug fixed in 5.5+.

fclose() bug fixed in 10.0+.

Comment by Hoang Xuan Phu [ 2018-06-28 ]

<3

Generated at Thu Feb 08 07:27:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.