[MDEV-8112] PATCH: no slave left behind Created: 2015-05-07  Updated: 2023-12-15

Status: In Review
Project: MariaDB Server
Component/s: None
Fix Version/s: 11.5

Type: Task Priority: Major
Reporter: Jonas Oreland Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 7
Labels: None

Attachments: File nslb.patch    
Issue Links:
PartOf
is part of MXS-1155 Define a set of features for better i... Closed

 Description   

patch has been ported to 10.1
you can use it under new (also called "3-clause") BSD license
i have done internal benchmarks...but you are most welcome to do some too.

DESCRIPTION:
no slave left behind

this patch implements master throttling based on slave lag,
aka no slave left behind. the core feature works as follows
1) the semi-sync-reply is ammended to also report back SQL-thread
position (aka exec position)
2) transactions are not removed from the "active-transaction-list"
in the semi-sync-master plugin until atleast one slave has reported
that it has executed this transaction. the slave lag can then
be estimated by calculating how long the oldest transaction has been
lingering in the active-transaction-list.
3) client-threads are forced to wait before commit until slave lag
has decreased to acceptable value.

the following variables are introduced on master:

  • rpl_semi_sync_master_max_slave_lag (global)
  • rpl_semi_sync_master_slave_lag_wait_timeout (session)

the following status variables are introduced on master:

  • rpl_semi_sync_master_slave_lag_wait_sessions
  • rpl_semi_sync_master_estimated_slave_lag
  • rpl_semi_sync_master_trx_slave_lag_wait_time
  • rpl_semi_sync_master_trx_slave_lag_wait_num
  • rpl_semi_sync_master_avg_trx_slave_lag_wait_time

the following variables are introduced on slave:

  • rpl_semi_sync_slave_lag_enabled (global)

in addition to this, 2 optimizations that decreases overhead of semi-sync
is introduced.
1) the idea of this is that if when a slave should send and transaction,
it checks if it should be semi-synced, but rather
than semi-sync:ing each transaction (which is done currently) the code
will skip semi-syncing transaction if there already is newer transactions
committed. But, since this can mean that semi-syncing is delayed indefinitely
a cap is set using 2 new master variables:

  • rpl_semi_sync_master_max_unacked_event_bytes (global)
  • rpl_semi_sync_master_max_unacked_event_count (global)
    2) rpl_semi_sync_master_group_commit which makes the semi-sync
    plugin only semi-sync the last transaction in a group commit.


 Comments   
Comment by Jonas Oreland [ 2015-05-08 ]

a comment is that I have not tested/considered parallel slave applier.
but if the get_master_log_pos-function that I wrote works with parallel slave applier,
it should work. feedback welcome

Comment by Kristian Nielsen [ 2015-05-18 ]

> a comment is that I have not tested/considered parallel slave applier. but
> if the get_master_log_pos-function that I wrote works with parallel slave
> applier, it should work.

I think it should work. The rli->group_master_log_name and
rli->group_master_log_pos fields are also updated in parallel replication.

The update happens out-of-order though. Especially when using multiple
replications domains and GTID, one domain can be quite a bit ahead of
another. So the "no slave left behind" will use the position of the
most-ahead worker thread to tell how far the slave has progressed, not the
position of the most-behind worker. That seems fine, I think.

Comment by Kristian Nielsen [ 2015-05-18 ]

Review sent on maria-developers@:

https://lists.launchpad.net/maria-developers/msg08575.html

Comment by Andrei Elkin [ 2019-12-05 ]

julien.fritsch The status is correct. The best time to review it would be when we'll be back to semisync context.

Generated at Thu Feb 08 07:24:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.