[MXS-1610] Changing masters does not close readwritesplit session immediately Created: 2018-01-16  Updated: 2018-04-17  Resolved: 2018-04-17

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 2.1.13
Fix Version/s: 2.3.0

Type: Bug Priority: Minor
Reporter: Petrov Aleksey Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

ubuntu 14.04



 Description   

With long sessions applications, like zabbix, when master is going down, readwritesplit service correctly changes the backend server. But when master going back, readwritesplit service does not change master back and in log appears messages like:

error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE

Example

Start server:

2018-01-16 12:05:43   notice : MaxScale started with 4 server threads.

Shuting down master:

2018-01-16 12:14:23   error  : Monitor was unable to connect to server [MASTER_IP_ADDRESS]:3306 : "Can't connect to MySQL server on 'MASTER_IP_ADDRESS' (107)"
2018-01-16 12:14:23   notice : Server changed state: MASTER_SERVER[MASTER_IP_ADDRESS:3306]: master_down. [Master, Synced, Running] -> [Down]
2018-01-16 12:14:23   notice : Server changed state: SLAVE_SERVER[SLAVE_IP_ADDRESS:3306]: new_master. [Slave, Synced, Running] -> [Master, Synced, Running]

Master changed, no errors:

Start master:

2018-01-16 12:15:26   notice : Server changed state: MASTER_SERVER[MASTER_IP_ADDRESS:3306]: master_up. [Down] -> [Master, Synced, Running]
2018-01-16 12:15:26   notice : Server changed state: SLAVE_SERVER[SLAVE_IP_ADDRESS:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]

Errors in log appears:

2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:26   error  : (2831) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE (subsequent similar messages suppressed for 10000 milliseconds)
2018-01-16 12:15:27   warning: (2595) [readwritesplit] [Splitter Service] Write query received from zabbix@::ffff:xxx.xxx.xxx.xxx. Could not find a valid master connection. Closing client connection.
2018-01-16 12:15:27   error  : (2595) [MySQLClient] Routing the query failed. Session will be closed.
2018-01-16 12:15:30   warning: (2606) [readwritesplit] [Splitter Service] Write query received from zabbix@::ffff:xxx.xxx.xxx.xxx. Could not find a valid master connection. Closing client connection.
2018-01-16 12:15:30   error  : (2606) [MySQLClient] Routing the query failed. Session will be closed.
2018-01-16 12:15:33   warning: (2599) [readwritesplit] [Splitter Service] Write query received from zabbix@::ffff:xxx.xxx.xxx.xxx. Could not find a valid master connection. Closing client connection.
2018-01-16 12:15:33   error  : (2599) [MySQLClient] Routing the query failed. Session will be closed.
2018-01-16 12:15:38   error  : (2675) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:39   error  : (2573) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:39   error  : (2573) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:39   error  : (2600) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:39   error  : (2600) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:39   error  : (2585) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:39   error  : (2585) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:39   error  : (2585) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:40   error  : (2605) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:40   error  : (2571) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:40   error  : (2571) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:41   error  : (2648) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:41   error  : (2626) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:41   error  : (2648) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:41   error  : (2626) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:41   error  : (2648) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:41   error  : (2626) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:41   error  : (2648) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:41   error  : (2626) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE (subsequent similar messages suppressed for 10000 milliseconds)
2018-01-16 12:15:52   error  : (2594) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:52   error  : (2594) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:52   error  : (2571) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:52   error  : (2576) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:52   error  : (2571) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:52   error  : (2576) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:52   error  : (2600) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:52   error  : (2600) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE
2018-01-16 12:15:53   error  : (2600) [readwritesplit] Could not find master among the backend servers. Previous master's state : RUNNING SLAVE

Аnd this errors continues to appear until the zabbix service reboots.

maxscale config:

[maxscale]
threads=4
 
[MASTER]
type=server
address=xxx.xxx.xxx.xxx
port=3306
protocol=MySQLBackend
priority=1
 
[SLAVE]
type=server
address=xxx.xxx.xxx.xxx
port=3306
protocol=MySQLBackend
priority=2
 
[SLAVE2]
type=server
address=xxx.xxx.xxx.xxx
port=3306
protocol=MySQLBackend
priority=3
 
[Galera Monitor]
type=monitor
module=galeramon
servers=MASTER,SLAVE,SLAVE2
user=xxx
passwd=yyy
monitor_interval=10000
available_when_donor=true
use_priority=1
 
[Splitter Service]
type=service
router=readwritesplit
servers=MASTER, SLAVE2, SLAVE
user=xxx
passwd=yyy
max_slave_connections=1
router_options=master_accept_reads=true,slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS
use_sql_variables_in=master
 
[Splitter Listener]
type=listener
service=Splitter Service
protocol=MySQLClient
port=3306



 Comments   
Comment by markus makela [ 2018-01-16 ]

Currently this is expected behavior. The connection to the master server is not recreated. Please see the MXS-1501 epic for planned features.

Comment by Petrov Aleksey [ 2018-01-16 ]

Why, after changing the master server, the sessions are not automatically destroyed?

Comment by markus makela [ 2018-01-16 ]

The sessions should be closed automatically when the master changes. Depending at which point the client connects, it can either connect to the first server (original master) or the second one. In both cases, if the master has changed, the connection is closed when the next query is received.

Edit: Actually this might be a curious side-effect of how the servers are handled. If the master is brought down and a new connection is made, the client will use the second server as its master. Then when the original master is brought back up, the servers simply change state. This state change, which is detected by the monitors, is not propagated to the routers. This would require adding a mechanism which notifies routers about non-destructive (from the network connections point of view) state changes.

Comment by Petrov Aleksey [ 2018-01-16 ]

Very strange. Zabbix connections spawned these errors for hours after switching masters.

Comment by markus makela [ 2018-01-16 ]

I have to revise my original analysis of this problem. The behavior is still expected but the cause for this is is slightly different. The reason why Zabbix receives errors hours after the master switch is because no queries were sent after the master switch was made and the readwritesplit wasn't able to detect that the master server the session started with is no longer a master.

Comment by markus makela [ 2018-04-17 ]

Fixed in 2.3.0 with the improved connection creation.

Generated at Thu Feb 08 04:08:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.