Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-847

server_down event is executed 8 times due to putting sever into maintenance mode

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Duplicate
    • 1.4.3
    • 2.0.1
    • Core
    • None

    Description

      Hi,

      I use mmmon. Putting priority to critical because I have a strong fieling mrm (MariaDB replication manager) has issues with this. Perhaps tanj can say if this is actually the case.

      I have a failover script that puts the failing server into maintenance mode to avoid failbacks. Somehow, with this setup, the server_down script is executed ~8 times.

      My config file:

      # MaxScale documentation on GitHub:
      # https://github.com/mariadb-corporation/MaxScale/blob/master/Documentation/Documentation-Contents.md
       
      # Global parameters
      #
      # Complete list of configuration options:
      # https://github.com/mariadb-corporation/MaxScale/blob/master/Documentation/Getting-Started/Configuration-Guide.md
       
      [maxscale]
      #threads=1
      threads=4
      #log_debug=1
       
      # Server definitions
      #
      # Set the address of the server to the network
      # address of a MySQL server.
      #
       
      [core01]
      type=server
      address=customer-prod-db-core01
      port=3306
      protocol=MySQLBackend
      masterweight=1
       
      [core11]
      type=server
      address=customer-prod-db-core11
      port=3306
      protocol=MySQLBackend
      masterweight=0
       
      [history01]
      type=server
      address=customer-prod-db-history01
      port=3306
      protocol=MySQLBackend
       
      [history11]
      type=server
      address=customer-prod-db-history11
      port=3306
      protocol=MySQLBackend
       
      #### MASTER - MASTER - WRITE ####
      [Core11 master slave  Monitor]
      type=monitor
      module=mmmon
      servers=core01,core11
      user=maxscale
      passwd=***
      script=/root/replication-scripts/failover-master.sh --event=$EVENT --initiator=$INITIATOR --nodelist=$NODELIST
      events=master_down,server_down
      monitor_interval=500
      # replication_lag_monitor=1 ## Does not work yet in mmmon (or multimaster in mysqlmon) --michael@MariaDB 2016-08-27
      # max_slave_replication_lag=5 ## https://jira.mariadb.org/browse/MXS-839
       
      [Core01 Master read-write Service]
      type=service
      router=readconnroute
      servers=core01,core11
      user=maxscale
      passwd=***
      router_options=master
       
      [Core01 Master read-write Listener]
      type=listener
      service=Core01 Master read-write Service
      protocol=MySQLClient
      port=3310
       
      ##### READ ONLY #####
       
      [History01 Read-Only Service]
      type=service
      router=readconnroute
      servers=history01, history11
      user=maxscale
      passwd=***
      # Impossible to use router_option slave because mmmon does not monitor these.
      # mysqlmon cannot monitor it because there is a multi master setup causing no master to be selected by mysqlmon and the cluster of 2 slaves getting 'slave from external master' state. --michael@mariadb 2016-08-26
      # router_options=slave 
      #filters=MyRegexFilter
       
      [History01 Read-Only Listener]
      type=listener
      service=History01 Read-Only Service
      protocol=MySQLClient
      port=3317
       
      ##
       
       
      [MaxAdmin Service]
      type=service
      router=cli
       
      [MaxAdmin Listener]
      type=listener
      service=MaxAdmin Service
      protocol=maxscaled
      port=6603
      

      My script:

      #!/bin/bash
      # failover_master.sh
       
      ARGS=$(getopt -o '' --long 'event:,initiator:,nodelist:' -- "$@")
      eval set -- "$ARGS"
       
      while true; do
          case "$1" in
              --event)
                  shift;
                  event=$1
                  shift;
              ;;
              --initiator)
                  shift;
                  initiator=$1
                  shift;
              ;;
              --nodelist)
                  shift;
                  nodelist=$1
                  shift;
              ;;
              --)
                  shift;
                  break;
              ;;
          esac
      done
       
      candidate=`echo "$nodelist" | awk -F':' '{print $1}'`
      maxscale_host=`echo "$initiator" | awk -F'-' '{print $5}'`
      maxscale_host=`echo "$maxscale_host" | awk -F':' '{print $1}'`
       
      if [ -z $candidate ]; then
         echo "ERROR!!! NO candidate master found when failing over $initiator! The system might be down."|wall
         echo "ERROR!!! NO candidate master found! The system might be down."
         exit 0
      fi
       
      # WORK AROUND for race condition, see https://jira.mariadb.org/browse/MXS-845
      currently_in_maintenance=`maxadmin -pmariadb list servers|grep Maintenance|grep $maxscale_host|wc -l`
      if [ $currently_in_maintenance =  "0" ]; then
         maxadmin -pmariadb set server $maxscale_host maintenance
         maxadmin -pmariadb clear server $maxscale_host running
      else
         echo "This script is not the first one, exiting."|wall
         exit 1
      fi
       
      # loosen (ACI)D to speedup any lag. 
      mysql -u maxscale -p'***' --host=$candidate -e "set global sync_binlog=0; set global innodb_flush_log_at_trx_commit=0;set global innodb_io_capacity=50000;"
       
      while true; do
         echo "Waiting until all transactions have been applied on candidate master $candidate..."
         sleep 1
         SLAVESTAT=$(mysql -umaxscale -p'***' --host=$candidate -e "show slave status\G");
         exec_master_pos=`echo "$SLAVESTAT" | grep -w 'Exec_Master_Log_Pos:' | awk '{print $2}';`
         read_master_pos=`echo "$SLAVESTAT" | grep -w 'Read_Master_Log_Pos:' | awk '{print $2}';`
         if [ -n $old_read_master_pos ] && [ ! $old_read_master_pos = $read_master_pos ]; then
            echo "ERROR!!! Old master $initiator still receives transactions after putting it into maintenance! Manual intervention required to make sure the old master is really down."
            echo "ERROR!!! Old master $initiator still receives transactions after putting it into maintenance! Manual intervention required to make sure the old master is really down."|wall
            exit 0
         fi
         old_read_master_pos=read_master_pos
         count=`expr $read_master_pos - $exec_master_pos`
       
         if [ $count -eq 0 ]; then
            mysql -umaxscale -p'***' --host=$candidate -e "set global read_only=OFF; set global sync_binlog=1; set global innodb_flush_log_at_trx_commit=1;SET GLOBAL innodb_io_capacity=200" 
            break;
         fi
      done
      

      Attachments

        Issue Links

          Activity

            People

              markus makela markus makela
              michaeldg Michaël de groot
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.