[MXS-1045] Defunct processes after maxscale have executed script during failover Created: 2016-12-01 Updated: 2016-12-05 Resolved: 2016-12-05 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core, mmmon |
| Affects Version/s: | 1.4.3, 2.0.2 |
| Fix Version/s: | 2.0.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Richard Stracke | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | replication, script | ||
| Environment: |
reproduced on Ubuntu Xenial with Master - Master. |
||
| Attachments: |
|
| Description |
|
Steps to reproduce: Create a Master <-> Master replication. With docker: docker run \ docker run \ Execute on both server Execute on master101 Execute on master102 Install Maxscale 2.0.2 with attach configuration file (copy to /etc) copy test.sh to /var/lib/maxscale execute docker stop master101; result: root 21165 0.1 0.0 332652 8736 ? Ssl 12:32 0:06 /usr/bin/maxscale -d Output from failover.log from test.sh 2016.12.01 14:09:21 : --event 'master_down' --initiator '127.0.0.1:32810' --nodelist '127.0.0.1:32811' – maxscale log with info and debug enabled: 2016-12-01 14:13:43 error : Monitor was unable to connect to server 127.0.0.1:32810 : "Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 115" Richard Stracke |
| Comments |
| Comment by Richard Stracke [ 2016-12-01 ] | ||
|
One additional comment. The defunct processes vanishes after sudo service maxscale stop or restart Richard | ||
| Comment by markus makela [ 2016-12-01 ] | ||
|
I tested this quickly on Fedora 25 with the exact script and both the mysqlmon and mmmon modules but I was unable to reproduce it. I'll continue the investigation on Ubuntu Xenial. | ||
| Comment by markus makela [ 2016-12-05 ] | ||
|
I've managed to reproduce it and it seems to happen even with 2.0.2. | ||
| Comment by markus makela [ 2016-12-05 ] | ||
|
For some reason, the processes aren't sending the SIGCHLD signal to the parent process.
Normally, we'd see a log message about a SIGCHLD handler being called. | ||
| Comment by markus makela [ 2016-12-05 ] | ||
|
This seems to be caused by the fact that the SIGCHLD signal is not deleted from the original processes signal list. Removing the SIGCHLD handler for the parent process of the daemon process seems to fix this. This also never happens when MaxScale is run directly from the terminal with the -d flag. | ||
| Comment by markus makela [ 2016-12-05 ] | ||
|
The child process signals were ignored by the daemon process. Deleting the signal from the original parent's signal list fixes this. |