[MDEV-4136] Maria Galera cluster DB 5.5.28a does not stop on /etc/init.d/mysql stop Created: 2013-02-05 Updated: 2013-02-15 Resolved: 2013-02-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 5.5.28a-galera |
| Fix Version/s: | 5.5.29-galera |
| Type: | Bug | Priority: | Critical |
| Reporter: | Aleksey Sanin (Inactive) | Assignee: | Seppo Jaakola |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | galera | ||
| Environment: |
Cent OS 5.8 x64 |
||
| Description |
|
/etc/init.d/mysql stop hangs forever. Version: '5.5.28a-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 53306 MariaDB Server, wsrep_23.7rc1.rXXXX 130205 6:29:22 [Note] /usr/sbin/mysqld: Normal shutdown 130205 6:29:22 [Note] WSREP: Stop replication |
| Comments |
| Comment by Elena Stepanova [ 2013-02-05 ] |
|
Hi Aleksey, Does it happen every time, or does it depend on the workload / number of connections during the shutdown? At the first glance it looks like the upstream bug https://bugs.launchpad.net/codership-mysql/+bug/1099742 , but i'm not 100% sure since their error log seems much shorter (or they didn't quote the whole thing), and we don't have anything else to compare. Would you be able to run strace and/or gdb bt so we could see if it stops at the same place as in the other bug? Thanks |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-05 ] |
|
This happens 100% of the time. Server is doing nothing. Yes, I can try to get gdb tomorrow. |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-05 ] |
|
Forgot to mention, that this is definitely related to WSREP/Galera: same server, same configs with commented out wsrep_XXX options in my.cnf stops and restarts w/o problems. |
| Comment by Elena Stepanova [ 2013-02-05 ] |
|
I've set up a 5.5.28a node on CentOS 5.8 and shutdown works for me, but the sequence of events in the log is somewhat different, so apparently there might be some race conditions involved. |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-06 ] |
|
GDB stacktraces: [root@devdb02 ~]# ps -ef | grep mysql warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff2f7fd000 Thread 23 (Thread 0x40c91940 (LWP 18259)): Thread 22 (Thread 0x41713940 (LWP 18262)): Thread 21 (Thread 0x43015940 (LWP 18264)): Thread 20 (Thread 0x43a16940 (LWP 18265)): Thread 19 (Thread 0x44417940 (LWP 18266)): Thread 18 (Thread 0x44e18940 (LWP 18267)): Thread 17 (Thread 0x45819940 (LWP 18268)): Thread 16 (Thread 0x4621a940 (LWP 18269)): Thread 15 (Thread 0x46c1b940 (LWP 18270)): Thread 14 (Thread 0x4761c940 (LWP 18271)): Thread 13 (Thread 0x4801d940 (LWP 18272)): Thread 12 (Thread 0x48a1e940 (LWP 18273)): Thread 11 (Thread 0x4941f940 (LWP 18274)): Thread 10 (Thread 0x49e20940 (LWP 18276)): Thread 9 (Thread 0x4a821940 (LWP 18277)): Thread 8 (Thread 0x4b222940 (LWP 18278)): Thread 7 (Thread 0x4bc23940 (LWP 18279)): Thread 6 (Thread 0x4c624940 (LWP 18280)): Thread 5 (Thread 0x4d025940 (LWP 18281)): Thread 4 (Thread 0x4da26940 (LWP 18282)): Thread 3 (Thread 0x41bcb940 (LWP 18283)): Thread 2 (Thread 0x41815940 (LWP 18298)): Thread 1 (Thread 0x2b566fee76a0 (LWP 18257)): |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-06 ] |
|
So it looks similar to the upstream bug indeed. However, it is MariaDB specifc too: I actually tried the stock MySQL 5.5.28a with Galera patches and the problem went away. In the upstream bug the issue is 100% reproducible if one sets wsrep_on to false before shutdown. I believe this is what is happening here as well - somehow wsrep_on is set to false before the wsrep shutdown process starts. It smells like some race condition in the shutdown process with either clearing this flag directly or clearing the whole global_system_variables structure. |
| Comment by Elena Stepanova [ 2013-02-06 ] |
|
Hi Aleksey, Thank you. Assigning to Seppo to check why the behavior is specific to MariaDB-Galera. I will also add a comment to the Codership bug report. Actually, the stack looks somewhat different, in our case it's waiting in #1 0x0000000000508cdf in wsrep_wait_appliers_close(THD*) () while in the LP bug it's in #1 0x000000000051ee84 in inline_mysql_cond_wait (sig_ptr=0x0) at /builddir/build/BUILD/mysql-5.5.28/mysql-5.5.28/include/mysql/psi/mysql_thread.h:980 I'll leave it to Seppo to decide whether it's a duplicate or a separate problem. |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-12 ] |
|
output with wsrep_debug = 1 130212 3:42:48 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1745947) |
| Comment by Aleksey Sanin (Inactive) [ 2013-02-12 ] |
|
I think I figured it out. The problem is in the "thread_handling=pool-of-threads" setting. If I remove it then the shutdown goes just fine. With the "thread_handling=pool-of-threads", the unlink_thd() is not called (probably because one_thread_per_connection_scheduler() is not called. So yes, it was MariaDB specific at the end |
| Comment by Elena Stepanova [ 2013-02-12 ] |
|
Thank you, Aleksey. It is totally reproducible with thread pool indeed. |
| Comment by Elena Stepanova [ 2013-02-12 ] |
|
Reproducible on current maria-5.5-galera tree as well (revno 3378). To reproduce, it's enough to start server with wait till it starts, then try to shut it down. Seppo, Wlad can consult from the thread pool side, please contact him if needed. |
| Comment by Seppo Jaakola [ 2013-02-15 ] |
|
Committed a simple fix, which enables graceful shutdown of wsrep replicator when thread pool scheduler is being used. |
| Comment by Seppo Jaakola [ 2013-02-15 ] |
|
Fix was committed in revision: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3379 |