[MXS-1720] Priori causal read Created: 2018-03-16  Updated: 2024-01-04  Resolved: 2020-02-28

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: None
Fix Version/s: 2.5.0

Type: New Feature Priority: Major
Reporter: dapeng huang Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MXS-2443 ORM Connection Pooling Not Working Wi... Closed
relates to MXS-2489 ReadWriteSplit service redirect some ... Closed
relates to MXS-199 Support Causal Read in Read Write Spl... Closed
Epic Link: Router Improvements
Sprint: MXS-SPRINT-93, MXS-SPRINT-100

 Description   

Using master_wait_gitd(MXS-199) to achieve causal read is not work so well when replication lag cannot ignore (such as > 100ms); We tested it with 5.7 and turn on logical clock, and set innodb_flush_log_at_trx_commit = 1000 on replica, read write ratio is 5:1, the performance is no better than route to single node; Maybe Mysql 8.0's Writeset-based replication will improve this, or Polardb's physical replication.

Because replication lag has many influence factors,so we need another way to achieve causal read;

Our proposal:
1. Find a way to get latest gtid info of every replica;
a. Regularly get it from monitor
b. Update it when ok packet is received, if it is newer than recorded one, update it using CAS;
2. Compare backend's gtid in proxy side;



 Comments   
Comment by markus makela [ 2019-07-04 ]

Is this intended to solve cases when replication lag is too large for efficient use of causal_reads? Could max_slave_replication_lag help in these cases?

Comment by markus makela [ 2020-02-20 ]

Here are a few ways how the GTID extraction part could be implemented:

  • Using monitors for GTID retrieval could work but it would be highly reliant on the monitoring interval.
  • The replication stream could be used to extract GTIDs but the overhead of sending everything through MaxScale would far outweight the benefit of faster GTID event delivery.
  • Installing some sort of an agent application on the database node would allow an efficient and fast method of delivering GTIDs but this is the most cumbersome solution to apply as separate binaries would have to be installed and managed on each node.
  • Writing a server-side plugin that exports GTID events to MaxScale would be a neater solution but it still faces some of the same problems that an agent application would.

Another way to do this would be to extend the causal_reads behavior with dedicated GTID tracking threads for each server. These threads would execute a MASTER_GTID_WAIT for each GTID on the master, effectively synchronizing the slave to a certain point. The GTID waiting could be done in batches to synchronize multiple transactions at the same time. This logical timestamp could then be used in place of the replication lag in readwritesplit.

Comment by markus makela [ 2020-02-26 ]

Added a preliminary implementation that uses monitors to get the GTID positions.

Generated at Thu Feb 08 04:08:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.