Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
10.4.10, 10.4.12
-
None
-
Linux 4.19.102-gentoo x86_64 AMD EPYC 7451
Description
Attempt to stop mysqld with signal 15 fails for members of galera cluster.
Configuration is: 3 identical servers running galera nodes. All have similar config (attached).
When signal 15 is sent to mysqld process, mysql writes down /var/lib/mysql/grastate.dat filling in all the fields, for example:
# GALERA saved state
|
version: 2.1 |
uuid: 0b58193e-****-****-****-b2ddbb52b5f6
|
seqno: 37261 |
safe_to_bootstrap: 1 |
then declines any attempts to connect but still remains present in memory and not exiting:
# ps aux|grep 'sbin/mysqld' |
mysql 50683 0.5 0.6 68765724 24362700 ? Sl 13:57 0:24 /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf |
|
# strace -p 50683|head -n 20 |
strace: Process 50683 attached |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=1000}) = 0 (Timeout) |
This can last for days. Mysqld refuses signal 15 and if killed finally with signal 9, it fails to recover from binary log, like:
[ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
This always ends with SST.
In mentioned hung state, mysqld doesn't perform any IO operations, used memory size remains constant, seems that process runs some infinite loop, but CPU is also not used.
This happens on cluster with gigabytes of data, but this also was found on newly installed cluster with no data except default database.
Expected behavior was - mysqld exits on signal 15 in reasonable time with flushing cached data and closing files first.
Attachments
Issue Links
- relates to
-
MDEV-22116 Not able to shutdown MariaDB after upgrade. Happening randomly.
- Closed