[MDEV-22116] Not able to shutdown MariaDB after upgrade. Happening randomly. Created: 2020-04-02 Updated: 2020-08-04 Resolved: 2020-07-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.4.12 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | prasad | Assignee: | Vladislav Vaintroub |
| Resolution: | Incomplete | Votes: | 1 |
| Labels: | shutdown | ||
| Environment: |
Centos 6.10 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
It was observed that mysql.server stop not working all the times. It seems get stuck in loop. We have upgraded Maria DB 10.3.15 to 10.4.12 and observing this issue. Here are some more details: Also did strace on mysqld pid
|
| Comments |
| Comment by prasad [ 2020-04-02 ] |
|
It seems relevant to Also note that I checked all tables and corruption to make sure it is clean using mysqlcheck before shutdown. |
| Comment by Elena Stepanova [ 2020-04-05 ] |
|
Do you also have thread_handling = pool-of-threads (as in |
| Comment by prasad [ 2020-04-06 ] |
|
We have thread_handling=one-thread-per-connection (default) and never changed. It also seems happening only for master and not slaves. When do master upgrade, we also make sure we enable skip-slave-start, skip-networking to completely disconnect communication from slaves before upgrade. Is there anything you want me to try? Is there any other configs that is preventing clean master shutdown? |
| Comment by prasad [ 2020-04-12 ] |
|
I have also tried other options "mysqladmin shutdown" also 'shutdown wait for all slaves'. I still see same issue. For some reason, master was unable to do clean shutdown. |
| Comment by Elena Stepanova [ 2020-04-13 ] |
|
Can you get all threads' stack trace from the hanging server (mysqld process)? |
| Comment by prasad [ 2020-04-15 ] |
|
Please find attached logs |
| Comment by Elena Stepanova [ 2020-04-16 ] |
|
Thanks. |
| Comment by Andrei Elkin [ 2020-04-17 ] |
|
There's nothing replication thread specific in the stacktrace. We can see that the main thread is still around to poll fd:s. I can't explain that, but can offer to consider what if on this CENT OS 6.0 the server is built with DONT_USE_THR_ALARM so that this otherwise "default" part of SIGTERM handling |
| Comment by Elena Stepanova [ 2020-04-17 ] |
|
I guess it belongs to runtime then. I'll assign it to sanja as the runtime team lead, but maybe wlad could pick it up. |
| Comment by Vladislav Vaintroub [ 2020-04-17 ] |
|
vrpprasad, can you next time try switch on strace, and "kill pid-of-mysqld" . To me it does not look like server was informed about shutdown at all |
| Comment by Sergey Vojtovich [ 2020-04-17 ] |
|
vrpprasad, how was this binary compiled? |
| Comment by prasad [ 2020-04-20 ] |
|
@sergey : No. we didn't compile and we downloaded binaries from https://downloads.mariadb.org/MariaDB/. |
| Comment by prasad [ 2020-04-21 ] |
|
@vladislav : I put "strace -p <mysqldpid>" and killed mysql. Please find the strace log. |
| Comment by Vladislav Vaintroub [ 2020-04-21 ] |
|
vrpprasad, did the server hang during this attempt? do you have a corresponding gdb if it did not shutdown? |
| Comment by Andrei Elkin [ 2020-04-21 ] |
|
vrpprasad Thanks for the new upload! Sorry we did not specify how much of the error the snippet should contain. It's too little in the 2 lines.. Sadly. Many thanks! Andrei |
| Comment by prasad [ 2020-04-22 ] |
|
@andrei and @Vladislav. |
| Comment by Andrei Elkin [ 2020-04-27 ] |
|
vrpprasad The stacktrace details get much harder to perceive 'cos of unresolved (??) You should find a way to get the symbols back Secondly, I believe that the error log is taken from the server run Could you please take that under control and provide us with SELECT @@global.log_warnings report which shall clear out doubts about the reason of lack of error log messages So could you please repeat it over again with the resolved symbols and real @@global.log_warnings = 3? Thank you. Andrei |
| Comment by Andrei Elkin [ 2020-04-28 ] |
|
vrpprasad, Hi. Thanks for bring more data for analysis! Yet the latest errorlog and stacktrace 2020-04-28 17:58:39 9 [Note] Start binlog_dump to slave_server(3), pos(master-bin.000026, 343) which must be there already with @@global.log_warnings=2, but the stacktrace has the binlog dump thread's stack. Also could try out Regards, Andrei |
| Comment by Sergey Vojtovich [ 2020-04-30 ] |
|
I believe this issue worth investigating as soon as there's interest on both sides. |
| Comment by prasad [ 2020-05-05 ] |
|
HI @Sergey and @Andrei, |
| Comment by Vladislav Vaintroub [ 2020-07-28 ] |
|
not sure if there is anything more to do. |