[MXS-1045] Defunct processes after maxscale have executed script during failover Created: 2016-12-01  Updated: 2016-12-05  Resolved: 2016-12-05

Status: Closed
Project: MariaDB MaxScale
Component/s: Core, mmmon
Affects Version/s: 1.4.3, 2.0.2
Fix Version/s: 2.0.3

Type: Bug Priority: Minor
Reporter: Richard Stracke Assignee: markus makela
Resolution: Fixed Votes: 1
Labels: replication, script
Environment:

reproduced on Ubuntu Xenial with Master - Master.
reproduced with MariaDB 10.1 and MySQL 5.6 Master <-> Master


Attachments: File maxscale.cnf     File test.sh    

 Description   

Steps to reproduce:

Create a Master <-> Master replication.

With docker:

docker run \
--name master101 \
-d \
-p 32810:3306 \
-e MYSQL_ROOT_PASSWORD=maria2016 \
mariadb:10.1 \
--server-id=7 \
--log-bin

docker run \
--name master102 \
-d \
-p 32811:3306 \
-e MYSQL_ROOT_PASSWORD=maria2016 \
mariadb:10.1 \
--server-id=8 \
--log-bin

Execute on both server
GRANT REPLICATION SLAVE ON . TO 'repl'@'%'
IDENTIFIED BY 'slave2016';

Execute on master101
show master status;CHANGE MASTER TO
MASTER_HOST='127.0.0.1',
MASTER_PORT='32821'
MASTER_USER='repl',
MASTER_PASSWORD='slave2016';

Execute on master102
show master status;CHANGE MASTER TO
MASTER_HOST='127.0.0.1',
MASTER_PORT='32820'
MASTER_USER='repl',
MASTER_PASSWORD='slave2016';
Execute both server:
start slave;
CREATE USER 'maxscale'@'%' IDENTIFIED BY 'maxscale';
GRANT EXECUTE, PROCESS, SELECT, SHOW DATABASES, SHOW VIEW, ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TABLESPACE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, INDEX, INSERT, REFERENCES, TRIGGER, UPDATE, CREATE USER, FILE, LOCK TABLES, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, SHUTDOWN, SUPER ON . TO 'mm'@'%';

Install Maxscale 2.0.2 with attach configuration file (copy to /etc)

copy test.sh to /var/lib/maxscale

execute
sudo service maxscale start;

docker stop master101;

result:

root 21165 0.1 0.0 332652 8736 ? Ssl 12:32 0:06 /usr/bin/maxscale -d
maxscale 23707 0.1 0.0 287900 9304 ? Ssl 14:05 0:00 /usr/bin/maxscale --user=maxscale
maxscale 23741 0.0 0.0 0 0 ? Z 14:09 0:00 [test.sh] <defunct>

Output from failover.log from test.sh

2016.12.01 14:09:21 : --event 'master_down' --initiator '127.0.0.1:32810' --nodelist '127.0.0.1:32811' –
###############

maxscale log with info and debug enabled:

2016-12-01 14:13:43 error : Monitor was unable to connect to server 127.0.0.1:32810 : "Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 115"
2016-12-01 14:13:43 debug : Backend server 127.0.0.1:32810 state : DOWN
2016-12-01 14:13:43 notice : Server changed state: server1[127.0.0.1:32810]: master_down. [Master, Running] -> [Down]
2016-12-01 14:13:43 debug : [monitor_exec_cmd] Forked child process 23938 : /var/lib/maxscale/test.sh.
2016-12-01 14:13:43 notice : Executed monitor script '/var/lib/maxscale/test.sh --event=$EVENT --initiator=$INITIATOR --nodelist=$NODELIST' on event 'master_down'.
2016-12-01 14:13:43 debug : 139833263933184 [dcb_hangup_foreach]

Richard Stracke



 Comments   
Comment by Richard Stracke [ 2016-12-01 ]

One additional comment.

The defunct processes vanishes after sudo service maxscale stop or restart

Richard

Comment by markus makela [ 2016-12-01 ]

I tested this quickly on Fedora 25 with the exact script and both the mysqlmon and mmmon modules but I was unable to reproduce it. I'll continue the investigation on Ubuntu Xenial.

Comment by markus makela [ 2016-12-05 ]

I've managed to reproduce it and it seems to happen even with 2.0.2.

Comment by markus makela [ 2016-12-05 ]

For some reason, the processes aren't sending the SIGCHLD signal to the parent process.

2016-12-05 13:42:29   notice : [monitor_exec_cmd] Forked child process 17504 : /var/lib/maxscale/test.sh.
2016-12-05 13:42:29   notice : Executed monitor script '/var/lib/maxscale/test.sh --event=$EVENT --initiator=$INITIATOR --nodelist=$NODELIST' on event 'master_down'.

Normally, we'd see a log message about a SIGCHLD handler being called.

Comment by markus makela [ 2016-12-05 ]

This seems to be caused by the fact that the SIGCHLD signal is not deleted from the original processes signal list. Removing the SIGCHLD handler for the parent process of the daemon process seems to fix this.

This also never happens when MaxScale is run directly from the terminal with the -d flag.

Comment by markus makela [ 2016-12-05 ]

The child process signals were ignored by the daemon process. Deleting the signal from the original parent's signal list fixes this.

Generated at Thu Feb 08 04:03:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.