[MDEV-21707] Mariadb doesn't exit correctly Created: 2020-02-11 Updated: 2023-04-20 Resolved: 2023-04-20 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Storage Engine - InnoDB |
| Affects Version/s: | 10.4.10, 10.4.12 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Eugene | Assignee: | Jan Lindström |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Linux 4.19.102-gentoo x86_64 AMD EPYC 7451 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Attempt to stop mysqld with signal 15 fails for members of galera cluster. When signal 15 is sent to mysqld process, mysql writes down /var/lib/mysql/grastate.dat filling in all the fields, for example:
then declines any attempts to connect but still remains present in memory and not exiting:
This can last for days. Mysqld refuses signal 15 and if killed finally with signal 9, it fails to recover from binary log, like:
This always ends with SST. In mentioned hung state, mysqld doesn't perform any IO operations, used memory size remains constant, seems that process runs some infinite loop, but CPU is also not used. This happens on cluster with gigabytes of data, but this also was found on newly installed cluster with no data except default database. Expected behavior was - mysqld exits on signal 15 in reasonable time with flushing cached data and closing files first. |
| Comments |
| Comment by Eugene [ 2020-02-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Another example: node wrote /var/lib/mysql/grastate.dat file:
and mysqld process doesn't exit:
and so on... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2020-03-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Please find some additional details.
a bit later
5 mins later
and this socket connection never get closed. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2020-03-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Next attempt, shutdown of another node having very same configuration (hanging in same state, open socket connection with no socket file name listed):
ls -lha /proc/PID/fd looks as follows:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2020-03-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Please also find shutdown log attached. shutdown-mysqld.err
After this nothing happens even if you leave it for several hours, so finally you have to kill process with SIGKILL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2020-04-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Seems that issue is caused by thread handling mechanism.
Regardless of number of threads in pool (one cluster used 7000, then reduced to 2000 while second used 400), the issue with shutdown was present. However, problem suddenly appeared to be gone with removal of mentioned thread_handling line that in fact reverted to "one thread per connection". Shutdown was clean with 2000 permitted connections (over 1000 used). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström [ 2023-04-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
euglorg Can you try with more recent version of MariaDB and Galera library. If you can reproduce this issue please provide full error log, output from show processlist, and if you can kill server and provide full stack trace. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2023-04-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Jan, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström [ 2023-04-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
euglorg It seems that because this issue priority was low it newer got to work queue. There has been so many fixes and improvements its impossible to me to say is this issue fixed. If you have not seen it in 10.6, sounds good. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene [ 2023-04-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We haven't see it since the day we disabled `thread_handling = pool-of-threads` in configuration of (that time) 10.4. So in fact it's not clear whether it still affects newer versions of mariadb. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström [ 2023-04-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I used 10.4.29 and 3-node cluster and in my tests server shutdown normally with killall -TERM mysqld. |