[MDEV-20115] 10.4 crash after upgrade from 10.3 ( /usr/sbin/mysqld: Thread 11 (user : '') did not exit) Created: 2019-07-22  Updated: 2022-03-17  Resolved: 2022-03-17

Status: Closed
Project: MariaDB Server
Component/s: Galera, Server
Affects Version/s: 10.4.6
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Luke Alexander Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 8
Labels: None
Environment:

Ubuntu 16.04 4.15.0-55-generic #60~16.04.2-Ubuntu


Attachments: Text File gdb_out.txt    
Issue Links:
Duplicate
duplicates MDEV-20319 Unable to stop or restart mariadb.ser... Closed

 Description   

We have just upgraded from 10.3 to 10.4 on our slave database server to try and move away from a SEGV(11) issue we were seeing on our master server running 10.3 (no stack trace available for that issue).

The upgrade seemed to happen OK, but I needed to reboot the server after a kernel update, since the reboot the mariadb instance on the slave will not start (master still running 10.3).

error logs shows:

Version: '10.4.6-MariaDB-1:10.4.6+maria~xenial-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2019-07-22 16:05:20 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
2019-07-22 16:05:20 0 [Note] Event Scheduler: Purging the queue. 0 events
2019-07-22 16:05:20 0 [Note] InnoDB: FTS optimize thread exiting.
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 11 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 19 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 18 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 17 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 16 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 15 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 14 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 13 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 12 (user : '') did not exit
 
2019-07-22 16:05:40 0 [Warning] /usr/sbin/mysqld: Thread 10 (user : '') did not exit



 Comments   
Comment by Luke Alexander [ 2019-07-22 ]

It seems that something wrote this zero byte file `/var/lib/mysql/debian-10.4.flag`

On removing that file, the server was able to startup again!

Comment by Luke Alexander [ 2019-07-23 ]

Actually the crash we had on 10.3 appears to have come back, all we have in syslog is:

Jul 23 16:33:07 db3 systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
Jul 23 16:33:07 db3 systemd[1]: mariadb.service: Unit entered failed state.
Jul 23 16:33:07 db3 systemd[1]: mariadb.service: Failed with result 'signal'.
Jul 23 16:33:12 db3 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
Jul 23 16:33:12 db3 systemd[1]: Stopped MariaDB 10.4.6 database server.
Jul 23 16:33:12 db3 systemd[1]: Starting MariaDB 10.4.6 database server...

Comment by Eugene Kosov (Inactive) [ 2019-08-01 ]

Hi.

I see you have 2 issues: crash and something from Debian packaging. I'm not an expert in the latter but it seems to be somehow related to binary format compatibility: https://github.com/MariaDB/server/blob/b428b09997d172f29fc201b9ab05c160ef4cbc39/debian/po/templates.pot#L30
Maybe, your tables was created a long time ago on some pretty old MariaDB/MySQL?

Now about the crash. I can help without some addition info. Why don't you have a stack trace? Did MariaDB print it to you? I can't help without additional info.

One more thing. Bugs are fixed in the most old version. Then fix is merged to a more recent branches. That means that to try get fix you can just upgrade the minor version of the server. Of course, upgrading a major version could help too but that could mean that bug disappeared by accident.

Comment by Luke Alexander [ 2019-08-08 ]

That is what I find strange - there is no crash dump, no core file, nothing in the logs as to why mariadb crashes - just the SEGV message in syslog, if you are able to advise me how to enable crash dump reporting that would be great?

Comment by Luke Alexander [ 2019-08-08 ]

Also of note about the start/restart issue, the server appears to restart normally - then just when you would expect connections to resume it dies with the errors as below:

2019-08-08  8:20:01 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.4.6-MariaDB-1:10.4.6+maria~xenial-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2019-08-08  8:20:01 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
2019-08-08  8:20:01 0 [Note] Event Scheduler: Purging the queue. 0 events
2019-08-08  8:20:01 0 [Note] InnoDB: FTS optimize thread exiting.
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 11 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 19 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 18 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 17 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 16 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 15 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 14 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 13 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 12 (user : '') did not exit
 
2019-08-08  8:20:21 0 [Warning] /usr/sbin/mysqld: Thread 10 (user : '') did not exit

The process is still running (and consuming resources):

28516 mysql     20   0  0.426t 0.032t  20976 S  53.3  6.6   2:46.78 /usr/sbin/mysqld

But the mysql.sock file has disappeared:

# ls -la /var/run/mysqld/
total 0
drwxr-xr-x  2 mysql root   40 Aug  8 08:20 .
drwxr-xr-x 31 root  root 1220 Aug  8 08:03 ..

The pid file exists:

# ls -la /var/lib/mysql/
total 12
drwxr-xr-x  2 mysql mysql 4096 Jul 23 18:19 .
drwxr-xr-x 56 root  root  4096 Jul 24 16:03 ..
-rw-rw----  1 mysql mysql    6 Aug  8 08:20 mysql.pid

The process has many threads in operation still:

# ps -T -p  28516
  PID  SPID TTY          TIME CMD
28516 28516 ?        00:00:23 mysqld
28516 31771 ?        00:00:00 mysqld
28516 31772 ?        00:00:00 mysqld
28516 23411 ?        00:00:00 mysqld
28516 23412 ?        00:00:00 mysqld
28516 23413 ?        00:00:01 mysqld
28516 23414 ?        00:00:01 mysqld
28516 23415 ?        00:00:01 mysqld
28516 23416 ?        00:00:01 mysqld
28516 23417 ?        00:00:00 mysqld
28516 23418 ?        00:00:00 mysqld
28516 23419 ?        00:00:00 mysqld
28516 23420 ?        00:00:00 mysqld
28516 23421 ?        00:00:08 mysqld
28516 23422 ?        00:00:04 mysqld
28516 23423 ?        00:00:04 mysqld
28516 23424 ?        00:00:04 mysqld
28516  7077 ?        00:00:00 mysqld
28516  7078 ?        00:00:00 mysqld
28516  7079 ?        00:00:00 mysqld
28516  7091 ?        00:00:02 mysqld
28516  7317 ?        00:00:15 mysqld
28516  7318 ?        00:00:00 mysqld
28516  7319 ?        00:00:00 mysqld
28516 17278 ?        00:00:00 mysqld
28516 23352 ?        00:00:28 mysqld
28516 23353 ?        00:00:05 mysqld
28516 23354 ?        00:00:12 mysqld
28516 23355 ?        00:00:38 mysqld
28516 23356 ?        00:00:12 mysqld
28516 23357 ?        00:00:12 mysqld
28516 23358 ?        00:00:12 mysqld
28516 23359 ?        00:00:12 mysqld
28516 23360 ?        00:00:12 mysqld
28516 23361 ?        00:00:12 mysqld

Sometimes if I kill the process with a hammer (kill -9) it will then restart OK, other times I have seen that it will restart on it's own, other times I will spend a long time doing the same operation until eventually it starts responding and the socket file comes back...

Comment by Luke Alexander [ 2019-08-08 ]

Could this be related to the SEGV 11 we are experiencing? https://jira.mariadb.org/browse/MDEV-20108

Comment by Eugene Kosov (Inactive) [ 2019-08-08 ]

Could you try adding `--core-file` to your daemon? Maybe this could help us getting more info about crash.

One other thing is to gdb -p $YOUR_PID and then from gdb run thread apply all bt. This is for the case when daemon hangs.

As of MDEV-20108, I yet have no idea what could happen in your case.

Comment by Luke Alexander [ 2019-08-12 ]

I've attached a gdb file to this ticket, I don't know if that will help debug this or not...

Comment by acsfer [ 2019-08-13 ]

Similar bug here (after upgrade from 10.3 and symptoms) https://jira.mariadb.org/browse/MDEV-20319

Comment by acsfer [ 2019-08-15 ]

In my case, if i issue a STOP SLAVE; before systemctl stop|restart mariadb server, i have no problems (in my case, I'm only having this exact issue - same log output, process still running... - when server needs to be stopped or restarted).

Comment by Luke Alexander [ 2019-08-21 ]

We have some more data on this one, for some reason stack-trace was not enabled, it is now, this is from syslog:

Aug 21 10:03:28 db4 kernel: [2480495.302728] mysqld[32716]: segfault at 7f4e0ce77232 ip 00005616602bf6b5 sp 00007ee4ccddea90 error 7 in mysqld[56165f4f6000+1409000]
Aug 21 10:04:00 db4 systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
Aug 21 10:04:00 db4 systemd[1]: mariadb.service: Unit entered failed state.
Aug 21 10:04:00 db4 systemd[1]: mariadb.service: Failed with result 'signal'.
Aug 21 10:04:05 db4 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
Aug 21 10:04:05 db4 systemd[1]: Stopped MariaDB 10.4.6 database server.
Aug 21 10:04:05 db4 systemd[1]: Starting MariaDB 10.4.6 database server...
Aug 21 10:04:06 db4 mysqld[3708]: 2019-08-21 10:04:06 0 [Note] /usr/sbin/mysqld (mysqld 10.4.6-MariaDB-1:10.4.6+maria~xenial-log) starting as process 3708 ...

This is the trace in mysql error log

2019-08-21 10:01:24 22516128 [Note] InnoDB: Number of pools: 9
2019-08-21 10:03:03 22585028 [Note] InnoDB: Number of pools: 10
190821 10:03:03 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.4.6-MariaDB-1:10.4.6+maria~xenial-log
key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=8928
max_threads=10002
thread_count=8935
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 22042691 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5616602c258e]
/usr/sbin/mysqld(handle_fatal_signal+0x307)[0x56165fd57387]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f4e0daa8390]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfff0)[0x7f4e0daa6ff0]
/lib/x86_64-linux-gnu/libgcc_s.so.1(+0x10032)[0x7f4e0c30b032]
/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_ForcedUnwind+0x64)[0x7f4e0c30b3b4]
/lib/x86_64-linux-gnu/libpthread.so.0(__pthread_unwind+0x40)[0x7f4e0daa7070]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x101b5)[0x7f4e0daa71b5]
/lib/x86_64-linux-gnu/libpthread.so.0(recv+0x64)[0x7f4e0daa7864]
/usr/sbin/mysqld(vio_read+0x78)[0x561660324ea8]
/usr/sbin/mysqld(+0x96774e)[0x56165fe5d74e]
/usr/sbin/mysqld(my_net_read_packet_reallen+0x210)[0x56165fe5e9a0]
/usr/sbin/mysqld(my_net_read_packet+0x1d)[0x56165fe5ea9d]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x90)[0x56165fb4e460]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x25a)[0x56165fc2a71a]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x56165fc2a85d]
/usr/sbin/mysqld(+0xa0f9a1)[0x56165ff059a1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f4e0da9e6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f4e0cf4541d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /mysqldata/mysql/data

Comment by Eugene Kosov (Inactive) [ 2019-08-29 ]

lukealexander I think in Ubuntu you have to also install a package with debug symbols. I do not know how exactly it's called. Probably that's the reason of having ?? in your gdb stacktraces: it's hard to said anything from it. Probably you have MDEV-20247

Comment by Eugene Kosov (Inactive) [ 2019-08-29 ]

The last crash is something unrelated. Could you disassemble vio_read() like this gdb -batch -ex 'file mysqld' -ex 'disassemble vio_read'?

Comment by Chris [ 2019-09-13 ]

I also experience this same issue on Debian Buster using 10.4.7-MariaDB-1:10.4.7+maria~buster. As per previous notes if I stop the slave I can stop or restart the server otherwise it will get stuck waiting.

As a work around I drop in a systemd configuration file with an execstop action to stop the slave. If you want to do this for Debian (and Ubuntu probably):

1. Create /etc/systemd/system/mariadb.service.d directory if not already existing
2. Create the file /etc/systemd/system/mariadb.service.d/execstop-slave.conf and put the following content:

# Fix for stopping MariaDB - If the slave is running during stop/restart it will hang.
# Bug reported: https://jira.mariadb.org/browse/MDEV-20115
[Service]
ExecStop=-/usr/bin/mysqladmin stop-slave

3. Reload systemd daemon: systemctl daemon-reload

Comment by Stefan Reger [ 2019-10-06 ]

I have a similar problem with Debian Buster running 10.4.8+maria~buster with parallel replication enabled:

On "systemctl stop mariadb" my log shows:
2019-10-06 15:17:58 132 [Note] Error reading relay log event: slave SQL thread was killed
Then the innodb shutdown is not initiated and the mysqld process hangs indefinitely.

Comment by acsfer [ 2019-11-13 ]

On the same series...trying to update the server from APT, server can't be stopped (same reasons) so timeouts occurr, only solution kill -9 PID

Nov 13 17:05:05 srv-fal systemd[1]: Starting MariaDB 10.4.10 database server...
Nov 13 17:05:05 srv-fal systemd[1]: mariadb.service: Found left-over process 1995 (mysqld) in control group while starting unit. Ignoring.
Nov 13 17:05:05 srv-fal systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Nov 13 17:05:05 srv-fal mysqld[12100]: 2019-11-13 17:05:05 0 [Note] /usr/sbin/mysqld (mysqld 10.4.10-MariaDB-1:10.4.10+maria~bionic-log) starting as p
Nov 13 17:06:35 srv-fal systemd[1]: mariadb.service: Start operation timed out. Terminating.
Nov 13 17:07:17 srv-fal systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Nov 13 17:08:47 srv-fal systemd[1]: mariadb.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Nov 13 17:08:47 srv-fal systemd[1]: mariadb.service: Failed with result 'timeout'.
Nov 13 17:08:47 srv-fal systemd[1]: Failed to start MariaDB 10.4.10 database server.

So;

STOP ALL SLAVES;
systemctl stop mariadb
apt upgrade
START ALL SLAVES;

Otherwise, it crashes.

Comment by Christian Rishøj [ 2019-12-12 ]

This is still an issue (with 10.4.10).

Comment by Ranjan Ghosh [ 2019-12-18 ]

Hm. Sth. is seriously wrong here IMO. APT upgrade also never worked for me. Perhaps any connection to #15748? I just did an "apt upgrade" on the first node of a 2-node cluster and the process got stuck:

preparing to unpack .../0-mariadb-server-10.4_1%3a10.4.11+maria~eoan_amd64.deb ...
Unpacking mariadb-server-10.4 (1:10.4.11+maria~eoan) over (1:10.4.10+maria~eoan) ...

and with journalctl I get his:

Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] WSREP: Shutdown replication
Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] WSREP: Server status change synced -> disconnecting
Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] WSREP: Closing send monitor...
Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] WSREP: Closed send monitor.
Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] WSREP: gcomm: terminating thread
Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] WSREP: gcomm: joining thread
Dec 18 18:58:30 yak1 mysqld[11321]: 2019-12-18 18:58:30 0 [Note] WSREP: gcomm: closing backend
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: view(view_id(NON_PRIM,33db1e59,10) memb {
Dec 18 18:58:31 yak1 mysqld[11321]:         33db1e59,0
Dec 18 18:58:31 yak1 mysqld[11321]: } joined {
Dec 18 18:58:31 yak1 mysqld[11321]: } left {
Dec 18 18:58:31 yak1 mysqld[11321]: } partitioned {
Dec 18 18:58:31 yak1 mysqld[11321]:         5edf6fdc,0
Dec 18 18:58:31 yak1 mysqld[11321]: })
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: PC protocol downgrade 1 -> 0
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: view((empty))
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: gcomm: closed
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: Flow-control interval: [16, 16]
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: Received NON-PRIMARY.
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 96646317)
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: New SELF-LEAVE.
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: Flow-control interval: [0, 0]
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: -1)
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: RECV thread exiting 0: Success
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: ####### processing CC -1, local, ordered
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: ####### My UUID: 33db1e59-1d12-11ea-964a-b6007aaaa7a1
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: ####### ST not required
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: recv_thread() joined.
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: Closing replication queue.
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: Closing slave action queue.
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: ================================================
Dec 18 18:58:31 yak1 mysqld[11321]: View:
Dec 18 18:58:31 yak1 mysqld[11321]:   id: a6a91495-1b7f-11ea-a0be-7f78e01c49de:-1
Dec 18 18:58:31 yak1 mysqld[11321]:   status: non-primary
Dec 18 18:58:31 yak1 mysqld[11321]:   protocol_version: 4
Dec 18 18:58:31 yak1 mysqld[11321]:   capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
Dec 18 18:58:31 yak1 mysqld[11321]:   final: no
Dec 18 18:58:31 yak1 mysqld[11321]:   own_index: 0
Dec 18 18:58:31 yak1 mysqld[11321]:   members(1):
Dec 18 18:58:31 yak1 mysqld[11321]:         0: 33db1e59-1d12-11ea-964a-b6007aaaa7a1, yak1
Dec 18 18:58:31 yak1 mysqld[11321]: =================================================
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: Non-primary view
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: ####### processing CC -1, local, ordered
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: ####### My UUID: 33db1e59-1d12-11ea-964a-b6007aaaa7a1
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: ####### ST not required
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: ================================================
Dec 18 18:58:31 yak1 mysqld[11321]: View:
Dec 18 18:58:31 yak1 mysqld[11321]:   id: a6a91495-1b7f-11ea-a0be-7f78e01c49de:-1
Dec 18 18:58:31 yak1 mysqld[11321]:   status: non-primary
Dec 18 18:58:31 yak1 mysqld[11321]:   protocol_version: 4
Dec 18 18:58:31 yak1 mysqld[11321]:   capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
Dec 18 18:58:31 yak1 mysqld[11321]:   final: yes
Dec 18 18:58:31 yak1 mysqld[11321]:   own_index: -1
Dec 18 18:58:31 yak1 mysqld[11321]:   members(0):
Dec 18 18:58:31 yak1 mysqld[11321]: =================================================
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: Non-primary view
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: Server status change disconnecting -> disconnected
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 0 [Note] WSREP: killing local connection: 2872598
Dec 18 18:58:31 yak1 mysqld[11321]: 2019-12-18 18:58:31 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)

and that's about it. No new log entries now for 10 minutes. Nothing. I will kill mysqld now, but I just also wanted to chime in because it's not really a great upgrading experience. This should be easier

Comment by Ranjan Ghosh [ 2019-12-18 ]

BTW: In my case it wasn't even a 10.3 => 10.4 upgrade but only a 10.4.10 to 10.4.11...

Comment by Geoff Montee (Inactive) [ 2019-12-18 ]

rgpublic,

It sounds like the server process may have been hung. If you are able to reproduce it, then it might help to get a gdb back trace of all threads when the process is hung. See here:

https://mariadb.com/kb/en/library/how-to-produce-a-full-stack-trace-for-mysqld/#getting-full-backtraces-for-all-threads-from-a-running-mysqld-process

Comment by Ranjan Ghosh [ 2019-12-18 ]

@GeoffMontee

Thanks for the info. Currently I cannot reproduce it, because it was a live server where I obviously don't want to force that situation again. I'm glad that it's running now. I simply did an "apt upgrade". Before it was 10.4.10, new 10.4.11. APT tried to stop the server process and then got stuck. (On an admittely somewhat off-topic sidenote, I still find it somewhat disconcerting how "crashy" Galera is in general on restarts. If our 2-node cluster is running and you don't touch it: Great. But every other time I try to restart one of the nodes for some reason, all hell breaks loose. More often than not, I find that I can only really get out of the situation if I wipe /var/lib/mysql on one of the nodes and let it resync everything. If the wise MariaDB folks are reading this: Please, please consider adding more tests on these kind of restarts - it's terribly buggy IMHO).

Comment by Christian Rishøj [ 2020-01-29 ]

While similar to MDEV-20821, this issue still persists.

Comment by Jeremy Haozhe Luo [ 2020-07-19 ]

I found here from MDEV-20319.

I have systemd stop/restart mariadb.service hang issue with new installed MariaDB 10.5.4 on CentOS Linux release 7.6.1810 (Core), not a upgrade case.

I try to edit systemd mariadb.service, then it works fine.

This is what I done:

sudo cp -a /usr/lib/systemd/system/mariadb.service /etc/systemd/system/mariadb.service
sudo vim /etc/systemd/system/mariadb.service

then add one line in [Service] section:

/etc/systemd/system/mariadb.service

[Service]
PIDFile=/var/lib/mysql/mariadb105.pid

The value is my mariadb instance pid file path, you should put what yours there.

systemctl daemon-reload

Once hung already after stop/restart, I try

kill -9 $current_pid


or maybe

mysqladmin shutdown

can do the same thing.

Then systemd stop/restart mariadb.service works for me.

Comment by Daniel Black [ 2020-07-20 ]

jeremy.l, for minor systemd service file changes recommend either `systemctl edit mariadb.service` to add a particular line or a drop in like gbe0 did above. This way more substantial changes in upgrades don't get ignored and your desired effect is still applicable.

The systemd service file is written not to SIGKILL (aka SendSIGKILL=NO) mariadb and let it shutdown smoothly because the recovery can be just as time expensive so solutions around sigkill aren't helping understand the problem. More recent systemd versions (v242+) will not start a new mariadb instance while the previous one shutting down (preventing the duplicate running instances seen in the MDEV-20319 logs.

Pidfiles were left out of mariadb packaged systemd files from the start because when systemd has a reference to the child process a pidfile isn't needed.

Having said all that, identifying the cause of hanging/lack of completion during shutdown is an issue that sill needs resovling.

jeremy.l if your problem in the hanging is related to async/classical/master/replica/slave replication then please include the gdb backtrace and include it as an attachment along with mariadb error log and `journalctl -n 40 -u mariadb.service`. If in doubt as to if this is replication related please create a new bug report.

Comment by Sergey Dushenkov [ 2022-01-14 ]

Hi All,
Actually I've went into the very similar issue and I guess I have managed to figure out what went wrong and caused an issue.
I've had zero issues with Mariadb startup/shutdown until the very last moment, when I was trying to reboot a server today, I have suffered from server not shutting down timely. Later I was by occasion changing the Mariadb config and tried to restart it's service... and surprise, it was not shutting down with same log as above.

I wondered what has happened and what were the changes I've made to my system recently.... And the ONLY change I did several days ago was: hostname change!
I guess that previous (original) hostname used during mysql configuration got somehow linked with hostname and shutdown script was looking for a 'root' user under 'new' hostname!

So what I did - I have just changed ny hostname back to old one and voila - issue is gone.

Now comes the questions: where should I look to update my config properly and to match with a new hostname?
I guess I need to somehow migrate root@old_hostname to root@new_hostname

Comment by Marko Mäkelä [ 2022-02-21 ]

I see that the Galera cluster has been mentioned at least in a few comments.

MariaDB Server 10.3 used Galera library version 3, while 10.4 and later use Galera 4. Maybe something in the Galera upgrade is not working correctly.

Comment by Jan Lindström (Inactive) [ 2022-03-15 ]

Clear is that rolling-upgrade from 10.3 --> 10.4 is not supported. You need to cleanly shutdown your all nodes, do upgrade and restart nodes. 10.4.6 is quite old so I recommend using more recent version of MariaDB server and Galera library. If these steps do not help, please send more information e.g. full error logs.

Comment by Luke Alexander [ 2022-03-15 ]

Hi,

We upgraded a few times, somewhere along the way, this error disappeared/got fixed, we are now on 10.4.21 - so from my point of view the original issue is resolved.

Thanks

Generated at Thu Feb 08 08:56:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.