Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Duplicate
-
1.4.3
-
None
Description
Hi,
I use mmmon. Putting priority to critical because I have a strong fieling mrm (MariaDB replication manager) has issues with this. Perhaps tanj can say if this is actually the case.
I have a failover script that puts the failing server into maintenance mode to avoid failbacks. Somehow, with this setup, the server_down script is executed ~8 times.
My config file:
# MaxScale documentation on GitHub:
|
# https://github.com/mariadb-corporation/MaxScale/blob/master/Documentation/Documentation-Contents.md
|
|
# Global parameters
|
#
|
# Complete list of configuration options:
|
# https://github.com/mariadb-corporation/MaxScale/blob/master/Documentation/Getting-Started/Configuration-Guide.md
|
|
[maxscale]
|
#threads=1
|
threads=4
|
#log_debug=1
|
|
# Server definitions
|
#
|
# Set the address of the server to the network
|
# address of a MySQL server.
|
#
|
|
[core01]
|
type=server
|
address=customer-prod-db-core01
|
port=3306
|
protocol=MySQLBackend
|
masterweight=1
|
|
[core11]
|
type=server
|
address=customer-prod-db-core11
|
port=3306
|
protocol=MySQLBackend
|
masterweight=0
|
|
[history01]
|
type=server
|
address=customer-prod-db-history01
|
port=3306
|
protocol=MySQLBackend
|
|
[history11]
|
type=server
|
address=customer-prod-db-history11
|
port=3306
|
protocol=MySQLBackend
|
|
#### MASTER - MASTER - WRITE ####
|
[Core11 master slave Monitor]
|
type=monitor
|
module=mmmon
|
servers=core01,core11
|
user=maxscale
|
passwd=***
|
script=/root/replication-scripts/failover-master.sh --event=$EVENT --initiator=$INITIATOR --nodelist=$NODELIST
|
events=master_down,server_down
|
monitor_interval=500
|
# replication_lag_monitor=1 ## Does not work yet in mmmon (or multimaster in mysqlmon) --michael@MariaDB 2016-08-27
|
# max_slave_replication_lag=5 ## https://jira.mariadb.org/browse/MXS-839
|
|
[Core01 Master read-write Service]
|
type=service
|
router=readconnroute
|
servers=core01,core11
|
user=maxscale
|
passwd=***
|
router_options=master
|
|
[Core01 Master read-write Listener]
|
type=listener
|
service=Core01 Master read-write Service
|
protocol=MySQLClient
|
port=3310
|
|
##### READ ONLY #####
|
|
[History01 Read-Only Service]
|
type=service
|
router=readconnroute
|
servers=history01, history11
|
user=maxscale
|
passwd=***
|
# Impossible to use router_option slave because mmmon does not monitor these.
|
# mysqlmon cannot monitor it because there is a multi master setup causing no master to be selected by mysqlmon and the cluster of 2 slaves getting 'slave from external master' state. --michael@mariadb 2016-08-26
|
# router_options=slave
|
#filters=MyRegexFilter
|
|
[History01 Read-Only Listener]
|
type=listener
|
service=History01 Read-Only Service
|
protocol=MySQLClient
|
port=3317
|
|
##
|
|
|
[MaxAdmin Service]
|
type=service
|
router=cli
|
|
[MaxAdmin Listener]
|
type=listener
|
service=MaxAdmin Service
|
protocol=maxscaled
|
port=6603
|
My script:
#!/bin/bash
|
# failover_master.sh
|
|
ARGS=$(getopt -o '' --long 'event:,initiator:,nodelist:' -- "$@")
|
eval set -- "$ARGS"
|
|
while true; do
|
case "$1" in
|
--event)
|
shift;
|
event=$1
|
shift;
|
;;
|
--initiator)
|
shift;
|
initiator=$1
|
shift;
|
;;
|
--nodelist)
|
shift;
|
nodelist=$1
|
shift;
|
;;
|
--)
|
shift;
|
break;
|
;;
|
esac
|
done
|
|
candidate=`echo "$nodelist" | awk -F':' '{print $1}'`
|
maxscale_host=`echo "$initiator" | awk -F'-' '{print $5}'`
|
maxscale_host=`echo "$maxscale_host" | awk -F':' '{print $1}'`
|
|
if [ -z $candidate ]; then
|
echo "ERROR!!! NO candidate master found when failing over $initiator! The system might be down."|wall
|
echo "ERROR!!! NO candidate master found! The system might be down."
|
exit 0
|
fi
|
|
# WORK AROUND for race condition, see https://jira.mariadb.org/browse/MXS-845
|
currently_in_maintenance=`maxadmin -pmariadb list servers|grep Maintenance|grep $maxscale_host|wc -l`
|
if [ $currently_in_maintenance = "0" ]; then
|
maxadmin -pmariadb set server $maxscale_host maintenance
|
maxadmin -pmariadb clear server $maxscale_host running
|
else
|
echo "This script is not the first one, exiting."|wall
|
exit 1
|
fi
|
|
# loosen (ACI)D to speedup any lag.
|
mysql -u maxscale -p'***' --host=$candidate -e "set global sync_binlog=0; set global innodb_flush_log_at_trx_commit=0;set global innodb_io_capacity=50000;"
|
|
while true; do
|
echo "Waiting until all transactions have been applied on candidate master $candidate..."
|
sleep 1
|
SLAVESTAT=$(mysql -umaxscale -p'***' --host=$candidate -e "show slave status\G");
|
exec_master_pos=`echo "$SLAVESTAT" | grep -w 'Exec_Master_Log_Pos:' | awk '{print $2}';`
|
read_master_pos=`echo "$SLAVESTAT" | grep -w 'Read_Master_Log_Pos:' | awk '{print $2}';`
|
if [ -n $old_read_master_pos ] && [ ! $old_read_master_pos = $read_master_pos ]; then
|
echo "ERROR!!! Old master $initiator still receives transactions after putting it into maintenance! Manual intervention required to make sure the old master is really down."
|
echo "ERROR!!! Old master $initiator still receives transactions after putting it into maintenance! Manual intervention required to make sure the old master is really down."|wall
|
exit 0
|
fi
|
old_read_master_pos=read_master_pos
|
count=`expr $read_master_pos - $exec_master_pos`
|
|
if [ $count -eq 0 ]; then
|
mysql -umaxscale -p'***' --host=$candidate -e "set global read_only=OFF; set global sync_binlog=1; set global innodb_flush_log_at_trx_commit=1;SET GLOBAL innodb_io_capacity=200"
|
break;
|
fi
|
done
|