Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
2.3.7
Description
I'm using MaxScale 2.3 with 3 MariaDB servers, 1 of them is master and the others are slaves and I set max_slave_replication_lag to 5 secs
[Read-Write-Service]
|
type=service
|
router=readwritesplit
|
servers=server1,server3,server2
|
max_slave_replication_lag=5
|
master_failure_mode=fail_on_write
|
After that I stop slaves for a period of time, and kept master running, all queries redirected to master, then slaves come back online, but they were hours behind master, when Monitor check slave status, sometime SHOW SLAVE STATUS return:
Slave_IO_Running Preparing
Seconds_Behind_Master NULL
then the Monitor decides that slave is up to date and redirect some queries to that slave, even slave is hours behind master!
I took a look over source code and IMHO I think this block of code is the reason
static inline bool rpl_lag_is_ok(SRWBackend& backend, int max_rlag) |
{
|
return max_rlag == MXS_RLAG_UNDEFINED || backend->server()->rlag <= max_rlag; |
}
|
So maybe removing ( max_rlag == MXS_RLAG_UNDEFINED) from condition might help in this case and damage in case slave was really up to date, maybe can check GTID for master and slave to determine that master and slave are really at same point of transaction
Master and slaves are running MariaDB 10.1.40
Attachments
Issue Links
- relates to
-
MXS-1720 Priori causal read
- Closed