[MXS-2700] Maxscale needs to check how up-to-date the slave is, before move traffic to it. Created: 2019-09-25  Updated: 2020-08-26  Resolved: 2020-08-26

Status: Closed
Project: MariaDB MaxScale
Component/s: mariadbmon
Affects Version/s: None
Fix Version/s: 2.5.0

Type: New Feature Priority: Major
Reporter: Nilnandan Joshi Assignee: Todd Stoffel (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

If somehow, slave (node1) can't connect to master and it's retrying again and again
We'll see the below state in show slave status

Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
..
Seconds_Behind_Master: NULL

In maxscale 'list servers' we could see the node1 status as 'Slave, Running' and maxscale is routing connections to the slave in which replication is broken.

┌────────┬─────────────┬──────┬─────────────┬─────────────────┬─────────────┐
│ Server │ Address     │ Port │ Connections │ State           │ GTID        │
├────────┼─────────────┼──────┼─────────────┼─────────────────┼─────────────┤
│ node1  │ 10.66.21.38 │ 6603 │ 17          │ Slave, Running  │ 1-2-3312917 │
├────────┼─────────────┼──────┼─────────────┼─────────────────┼─────────────┤
│ node2  │ 10.66.21.37 │ 6603 │ 49          │ Master, Running │ 1-2-3319973 │
└────────┴─────────────┴──────┴─────────────┴─────────────────┴─────────────┘

Since this 'node 1' replication is already broken maxscale status for this node should be in 'Running' state and no connections should be routed to this slave. But this is the current behaviour which needs to be changed. On the long run, we probably need to implement some kind of grading system, where the monitor or router checks how "up-to-date" the slave is.



 Comments   
Comment by Johan Wikman [ 2019-09-25 ]

niljoshi Shouldn't this be considered a bug and not a new feature?

Comment by Nilnandan Joshi [ 2019-09-26 ]

Hi johan.wikman, as per my discussion with esa.korhonen, the current behaviour is default.
But we can improve it to cover above scenario. So I considered it as new feature. But I can
change it to Bug if everybody think that it is the right type.

Comment by markus makela [ 2019-10-03 ]

Wouldn't this be fixed by changing the value of master-retry-count on the server?

Comment by Nilnandan Joshi [ 2019-10-04 ]

Hi markus makela, wasn't that a workaround and not the solution? Btw, we didn't get any update if master-retry-count has fixed the problem.

Comment by markus makela [ 2019-10-07 ]

One option would be to detect that a slave is trying to connect for a long time and then treat it as a failed server if it can never connect to the master. In the end, you still have to define a timeout somewhere and this can already be done with master-retry-count.

Comment by markus makela [ 2020-08-26 ]

Fixed with the new 2.5 parameter: https://mariadb.com/kb/en/mariadb-maxscale-25-mariadb-monitor/#slave_conditions

Generated at Thu Feb 08 04:16:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.