[MXS-1862] Failover does not happen with Maxscale in Master->Slave replication setup Created: 2018-05-08  Updated: 2018-08-17  Resolved: 2018-08-17

Status: Closed
Project: MariaDB MaxScale
Component/s: failover
Affects Version/s: 2.2.5
Fix Version/s: 2.2.5

Type: Task Priority: Major
Reporter: Venkatarao Beru Assignee: Esa Korhonen
Resolution: Not a Bug Votes: 1
Labels: maxscale
Environment:

Non production environment



 Description   

We have 2 servers installed Mariadb and setup Master->Slave replication. Installed and setup Maxscale 2.2.5 on SERVER3

MariaDB versions on 2 servers

SERVER1 Server version: 10.2.12-MariaDB-log MariaDB Server
SERVER2 Server version: 10.2.12-MariaDB-log MariaDB Server

Maxscale version on SERVER3
========================

[maxscale@SERVER3 ~]$ maxscale --version
MaxScale 2.2.5

/etc/maxscale.cnf file contents
=======================

[MaxScale]
threads=auto
admin_auth=http
 
[SplitterService]
type=service
router=readwritesplit
servers=dbserv1,dbserv2
user=maxscale
passwd=*************
 
[SplitterListener]
type=listener
service=SplitterService
protocol=MariaDBClient
port=3301
#socket=/tmp/ClusterMaster
 
[dbserv1]
type=server
address=SERVER1
port=3307
protocol=MariaDBBackend
 
[dbserv2]
type=server
address=SERVER2
port=3307
protocol=MariaDBBackend
 
[ReplicationMonitor]
type=monitor
module=mariadbmon
servers=dbserv1,dbserv2
user=maxscale
passwd=*************
replication_user=mysql_repl
replication_password=***********
#monitor_interval=1000
monitor_interval=60
auto_failover=1
failover_timeout=5
auto_rejoin=true
failcount=5
master_failure_timeout=2
verify_master_failure=true
switchover_timeout=90
detect_standalone_master=true
 
[CLI]
type=service
router=cli
 
[CLIListener]
type=listener
service=CLI
protocol=maxscaled
address=localhost
port=6600

MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
dbserv1            | SERVER1 |  3307 |           0 | Master, Running
dbserv2            | SERVER2 |  3307 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

When we do failover by stopping mariadb instance on SERVER1 then the failover was not happend. You can see the following information

/mysql/maxscale/var/log/maxscale/maxscale.log
=====================================

2018-05-08 12:19:16   error  : Monitor was unable to connect to server [dbcct3vmyd01003.idm.sgdcelab.sabre.com]:3307 : "Can't connect to MySQL server on 'dbcct3vmyd01003.idm.sgdcelab.sabre.com' (115)"
2018-05-08 12:19:16   notice : [mariadbmon] Server [dbcct3vmyd01003.idm.sgdcelab.sabre.com]:3307 lost the master status.
2018-05-08 12:19:16   notice : Server changed state: dbserv1[dbcct3vmyd01003.idm.sgdcelab.sabre.com:3307]: master_down. [Master, Running] -> [Down]
2018-05-08 12:19:23   error  : [readwritesplit] Couldn't find suitable Master from 2 candidates.
2018-05-08 12:19:23   error  : Failed to create new router session for service 'SplitterService'. See previous errors for more details.
2018-05-08 12:19:39   error  : [readwritesplit] Couldn't find suitable Master from 2 candidates.
2018-05-08 12:19:39   error  : Failed to create new router session for service 'SplitterService'. See previous errors for more details.

MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
dbserv1            | dbcct3vmyd01003.idm.sgdcelab.sabre.com |  3307 |           0 | Down
dbserv2            | dbcct3vmyd01004.idm.sgdcelab.sabre.com |  3307 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

Privileges for maxscale and mysql_repl users
===================================

MariaDB [(none)]> show grants for 'maxscale_admin';
+---------------------------------------------------------------------+
| Grants for maxscale_admin                                           |
+---------------------------------------------------------------------+
| GRANT INSERT, RELOAD, REPLICATION CLIENT ON *.* TO 'maxscale_admin' |
| GRANT SELECT ON `mysql`.`tables_priv` TO 'maxscale_admin'           |
| GRANT SELECT ON `mysql`.`user` TO 'maxscale_admin'                  |
| GRANT SELECT ON `mysql`.`db` TO 'maxscale_admin'                    |
+---------------------------------------------------------------------+
4 rows in set (0.00 sec)
 
MariaDB [(none)]> show grants for 'mysql_repl';
+-----------------------------------------------------------------------------------+
| Grants for mysql_repl                                                             |
+-----------------------------------------------------------------------------------+
| GRANT RELOAD, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'mysql_repl' |
+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> show grants for 'maxscale'@SERVER3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for maxscale@SERVER3                                                                                                                                          |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT maxscale_admin TO 'maxscale'@'SERVER3'                                                                                                                         |
| GRANT SHOW DATABASES, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'maxscale'@'SERVER3' IDENTIFIED BY PASSWORD '***********************' |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.01 sec)

MariaDB [(none)]> show grants for 'mysql_repl'@'SERVER1';
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for mysql_repl@SERVER1                                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT mysql_repl TO 'mysql_repl'@'SERVER1'                                                                      |
| GRANT USAGE ON *.* TO 'mysql_repl'@'SERVER1' IDENTIFIED BY PASSWORD '*****************' |
+------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
 
MariaDB [(none)]> show grants for 'mysql_repl'@'SERVER2';
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for mysql_repl@SERVER2                                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT mysql_repl TO 'mysql_repl'@'SERVER2'                                                                      |
| GRANT USAGE ON *.* TO 'mysql_repl'@'SERVER2' IDENTIFIED BY PASSWORD '***************' |
+------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
 
MariaDB [(none)]> show grants for 'mysql_repl'@'SERVER3';
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Grants for mysql_repl@SERVER3                                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| GRANT mysql_repl TO 'mysql_repl'@'SERVER3'                                                                      |
| GRANT USAGE ON *.* TO 'mysql_repl'@'SERVER3' IDENTIFIED BY PASSWORD '********' |
+------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Could you please suggest me is there any thing wrong in respect to maxscale configuration and privileges or any other issue?

Thanks,
Venkat Beru



 Comments   
Comment by Venkatarao Beru [ 2018-05-08 ]

Also would like to know, Is slave can be in readonly mode?

Thanks,
Venkat Beru

Comment by Venkatarao Beru [ 2018-05-09 ]

Waiting for your updates/suggestions...

Thanks,
Venkat Beru

Comment by Esa Korhonen [ 2018-05-10 ]

I don't see any clear reason for the failover not happening. Could you try running "show monitors" in MaxAdmin and paste the output here? The monitor disables automatic failover if it (or switchover) has failed previously. Also, try running manual failover and check the error message. Manual failover is started with the MaxAdmin command "call command mariadbmon failover ReplicationMonitor".

Having slave servers in read-only mode is ok and even recommended.

Regards,
Esa

Comment by Venkatarao Beru [ 2018-05-10 ]

Hi Esa,

Thank you for looking into this issue. We followed the following document exactly and configured maxscale according to that.

https://mariadb.com/resources/blog/mariadb-maxscale-22-introducing-failover-switchover-and-automatic-rejoin

Yesterday we tested the same failover and switchover by including the following configuration into /etc/maxscale.cnf under "[ReplicationMonitor]" section then both switchover and failover happened properly.

detect_standalone_master=true
events=master_down

[ReplicationMonitor]
type=monitor
module=mariadbmon
servers=dbserv1,dbserv2
user=maxscale
passwd=B3D6615B8ACD02EC5D9F2C2D2BBC72C5
replication_user=mysql_repl
replication_password=E3BBC2414C10E0163C8C452EDC67B143
#monitor_interval=1000
monitor_interval=60
auto_failover=1
failover_timeout=5
auto_rejoin=true
failcount=5
master_failure_timeout=2
verify_master_failure=true
switchover_timeout=90
detect_standalone_master=true
events=master_down

Manual steps have been tested also and went fine after including the above 2 parameters by using the following commands
#####################Switch over##################

[maxscale@dbcct3vmyd01006 ~]$ maxctrl call command mariadbmon switchover ReplicationMonitor dbserv2 dbserv1
OK

2018-05-09 02:03:50 notice : [mariadbmon] Stopped the monitor ReplicationMonitor for the duration of switchover.
2018-05-09 02:03:50 warning: [mariadbmon] Slave 'dbserv2' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2018-05-09 02:03:50 notice : [mariadbmon] Demoting server 'dbserv1'.
2018-05-09 02:03:50 notice : [mariadbmon] Promoting server 'dbserv2' to master.
2018-05-09 02:03:50 notice : [mariadbmon] Old master 'dbserv1' starting replication from 'dbserv2'.
2018-05-09 02:03:50 notice : [mariadbmon] Redirecting slaves to new master.
2018-05-09 02:03:50 notice : [mariadbmon] Switchover dbserv1 -> dbserv2 performed.
2018-05-09 02:03:50 notice : Loaded server states from journal file: /mysql/maxscale/var/lib/maxscale/ReplicationMonitor/monitor.dat
2018-05-09 02:03:50 notice : Server changed state: dbserv1[dbcct3vmyd01003.idm.sgdcelab.sabre.com:3307]: new_slave. [Master, Running] -> [Slave, Running]
2018-05-09 02:03:50 notice : Server changed state: dbserv2[dbcct3vmyd01004.idm.sgdcelab.sabre.com:3307]: new_master. [Slave, Running] -> [Master, Running]
2018-05-09 02:03:50 notice : [mariadbmon] Gtid domain id of master has changed: 1 -> 2.

[maxscale@dbcct3vmyd01006 ~]$ maxctrl call command mariadbmon switchover ReplicationMonitor dbserv1 dbserv2
OK

2018-05-09 02:08:05 notice : [mariadbmon] Stopped the monitor ReplicationMonitor for the duration of switchover.
2018-05-09 02:08:05 warning: [mariadbmon] Slave 'dbserv1' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2018-05-09 02:08:05 notice : [mariadbmon] Demoting server 'dbserv2'.
2018-05-09 02:08:05 notice : [mariadbmon] Promoting server 'dbserv1' to master.
2018-05-09 02:08:05 notice : [mariadbmon] Old master 'dbserv2' starting replication from 'dbserv1'.
2018-05-09 02:08:05 notice : [mariadbmon] Redirecting slaves to new master.
2018-05-09 02:08:05 notice : [mariadbmon] Switchover dbserv2 -> dbserv1 performed.
2018-05-09 02:08:05 notice : Loaded server states from journal file: /mysql/maxscale/var/lib/maxscale/ReplicationMonitor/monitor.dat
2018-05-09 02:08:05 notice : Server changed state: dbserv1[dbcct3vmyd01003.idm.sgdcelab.sabre.com:3307]: new_master. [Slave, Running] -> [Master, Running]
2018-05-09 02:08:05 notice : Server changed state: dbserv2[dbcct3vmyd01004.idm.sgdcelab.sabre.com:3307]: new_slave. [Master, Running] -> [Slave, Running]
2018-05-09 02:08:05 notice : [mariadbmon] Gtid domain id of master has changed: 2 -> 1.

###################Failover###################

maxctrl call command mariadbmon failover ReplicationMonitor

2018-05-08 13:49:49 error : Monitor was unable to connect to server [dbcct3vmyd01003.idm.sgdcelab.sabre.com]:3307 : "Can't connect to MySQL server on 'dbcct3vmyd01003.idm.sgdcelab.sabre.com' (115)"
2018-05-08 13:49:49 notice : [mariadbmon] Server [dbcct3vmyd01003.idm.sgdcelab.sabre.com]:3307 lost the master status.
2018-05-08 13:49:49 notice : Server changed state: dbserv1[dbcct3vmyd01003.idm.sgdcelab.sabre.com:3307]: master_down. [Master, Running] -> [Down]
2018-05-08 13:49:49 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins.
2018-05-08 13:49:50 notice : [mariadbmon] Performing automatic failover to replace failed master 'dbserv1'.
2018-05-08 13:49:50 warning: [mariadbmon] Slave 'dbserv2' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2018-05-08 13:49:50 notice : [mariadbmon] Promoting server 'dbserv2' to master.
2018-05-08 13:49:50 notice : [mariadbmon] Redirecting slaves to new master.
2018-05-08 13:49:50 warning: [mariadbmon] Setting standalone master, server 'dbserv2' is now the master.
2018-05-08 13:49:50 notice : Server changed state: dbserv2[dbcct3vmyd01004.idm.sgdcelab.sabre.com:3307]: new_master. [Slave, Running] -> [Master, Running]
2018-05-08 13:49:50 notice : [mariadbmon] Gtid domain id of master has changed: 1 -> 2.
2018-05-08 13:51:23 notice : Server changed state: dbserv1[dbcct3vmyd01003.idm.sgdcelab.sabre.com:3307]: server_up. [Down] -> [Running]
2018-05-08 13:51:23 notice : [mariadbmon] Directing standalone server 'dbserv1' to replicate from 'dbserv2'.
2018-05-08 13:51:23 notice : [mariadbmon] Standalone server 'dbserv1' starting replication.
2018-05-08 13:51:23 notice : [mariadbmon] 1 server(s) redirected or rejoined the cluster.
2018-05-08 13:54:03 notice : Server changed state: dbserv1[dbcct3vmyd01003.idm.sgdcelab.sabre.com:3307]: new_slave. [Running] -> [Slave, Running]
2018-05-08 13:54:03 notice : Server changed state: dbserv1[dbcct3vmyd01003.idm.sgdcelab.sabre.com:3307]: lost_slave. [Slave, Running] -> [Running]
2018-05-08 13:55:15 notice : Server changed state: dbserv1[dbcct3vmyd01003.idm.sgdcelab.sabre.com:3307]: new_slave. [Running] -> [Slave, Running]

#####################

Please let us know, is the following parameters manadatory to add into /etc/maxscale.cnf file? After adding the following parameters only failover and switchover works fine.

detect_standalone_master=true
events=master_down

Thank,
Venkat

Comment by Venkatarao Beru [ 2018-05-10 ]

If slave is in "ReadOnly" mode , how the write operations can perform once this slave becomes master after failover/switchover. Is there any thing manually we need to change the mode from ReadOnly to ReadWrite mode or is there any configuration parameter needs to be included in /etc/maxscale.cnf file?

Thanks,
Venkat Beru

Comment by Esa Korhonen [ 2018-05-11 ]

Looks like it is now working correctly. Please ask if there are more problems.

Regarding the settings:

"detect_standalone_master=true" is a good setting if you predict a situation where there is only one server left and that should be the master. Without this setting, MaxScale won't consider a lone server as "Master". This setting is on by default.

"events=master_down" is not required by failover. Leave it on if you wish to run your own scripts when a master goes down.

The monitor manages the "read_only"-flag: enabling it when a server is demoted to slave and disabling it when a slave is promoted to master.

The automatic failover/rejoin are disabled if the operation fails. Restarting MaxScale enables them again (or maxadmin command "alter monitor ReplicationMonitor auto_failover=1"). The maxadmin command "show monitors" shows the current status of the features.

The log shows messages such as "Gtid domain id of master has changed: 1 -> 2." This means that different servers have different domain id values. Unless the events created by the servers are meant to be in different replication streams, it's usually best to have the same gtid domain in the cluster. The server id:s are enough to identify servers.

Comment by Venkatarao Beru [ 2018-05-17 ]

Thank you Esa for your updates.

Thanks,
Venkat Beru

Generated at Thu Feb 08 04:09:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.