[MXS-2489] ReadWriteSplit service redirect some queries to laggy slave - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.7
Fix Version/s: 2.5.0
Component/s: mariadbmon, Monitor, readwritesplit
Labels:
- Maxscale
- mariadb

Description

I'm using MaxScale 2.3 with 3 MariaDB servers, 1 of them is master and the others are slaves and I set max_slave_replication_lag to 5 secs

[Read-Write-Service]

type=service

router=readwritesplit

servers=server1,server3,server2

max_slave_replication_lag=5

master_failure_mode=fail_on_write

After that I stop slaves for a period of time, and kept master running, all queries redirected to master, then slaves come back online, but they were hours behind master, when Monitor check slave status, sometime SHOW SLAVE STATUS return:
Slave_IO_Running Preparing
Seconds_Behind_Master NULL

then the Monitor decides that slave is up to date and redirect some queries to that slave, even slave is hours behind master!

I took a look over source code and IMHO I think this block of code is the reason

static inline bool rpl_lag_is_ok(SRWBackend& backend, int max_rlag)

   return max_rlag == MXS_RLAG_UNDEFINED || backend->server()->rlag <= max_rlag;

So maybe removing ( max_rlag == MXS_RLAG_UNDEFINED) from condition might help in this case and damage in case slave was really up to date, maybe can check GTID for master and slave to determine that master and slave are really at same point of transaction

Master and slaves are running MariaDB 10.1.40

Attachments

Issue Links

relates to

MXS-1720 Priori causal read

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Abdul Rahman Babil

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2019-05-15 05:25

Updated:: 2020-03-27 22:47

Resolved:: 2020-03-02 07:49

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.