[MDEV-21322] report slave progress to the master - Jira

Details

Type: New Feature
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Replication
Labels:

Description

When using the semi-sync protocol, track each replica's progress. The SHOW REPLICA HOSTS output should be extended with two fields: one representing the GTID state last sent to the replica, and the other representing the GTID state last ACK'd from the replica.

Additionally, treat rpl_semi_sync_master_timeout=0 as a special case where transactions do not await an ACK, but still report progress from the ack_thread on replica reply.

When running SHOW REPLICA HOSTS instead of 4 columns

SHOW REPLICA HOSTS;

Server_id	Host	Port	Master_id

we want to have additional columns

Server_id  Host       Port    Master_id   Gtid_State_Sent   Gtid_State_Ack

3          127.0.0.1  16002   1           0-1-5,1-2-10      0-1-4,1-2-7

4          127.0.0.1  16003   1           0-1-5,1-2-10      0-1-2,1-2-3

Additional notes:
1. Replicas with semi-sync disabled should have empty values for Gtid_State_Ack.
2. Updating a replica's Gtid_State_Ack should be done by the ack thread after validating that the transaction was sent to the replica
3. The new columns should always be present, regardless of rpl_semi_sync_master_enabled, as the primary can disable semi-sync dynamically, and it could still be useful to display the information.
4. Per design, when rpl_semi_sync_master_timeout=0, than rpl_semi_sync_master_status, if enabled earlier, should stay ON , instead of switching off to async and incrementing the counter updates.
5. Per desing, in order to distinguish replica type between async, semi-sync stalled and semi-sync active replica, suggestion is to add new column Replica_type with replica type values depending on the state of replica.

Attachments

Issue Links

relates to

MDBF-573 Create the blog for a semi-sync replication with an example

Closed

MDEV-31556 rpl_semi_sync_master_[get,request]_ack should be controllable with `reset master` statement

Open

MDEV-31557 rpl_semi_sync_master_net_wait_num is not protected

Open

MDEV-33614 Rpl_semi_sync_master_request_ack not showing result after stopping the ACK thread

Open

MDEV-33615 Rpl_semi_sync_master_yes_tx doesn't get incremented when one of slave's IO thread is stopped with semi-sync setup

Open

MDEV-18475 extend show slave hosts

Open

links to

PR 2374

(1 relates to, 1 links to)

Activity

Ascending order - Click to sort in descending order

View 20 older comments

Brandon Nesterenko added a comment - 2024-05-27 14:30

Hi knielsen and Elkin!

I've put some final touches to the previously started patch, and it is ready for one of your reviews: PR-3288 (note this is a new PR that I've opened).

Brandon Nesterenko added a comment - 2024-05-27 14:30 Hi knielsen and Elkin ! I've put some final touches to the previously started patch, and it is ready for one of your reviews: PR-3288 (note this is a new PR that I've opened).

Brandon Nesterenko added a comment - 2024-06-04 18:29

Hi Roel!

This is ready to be tested in branch preview-11.6-MDEV-21322.

Brandon Nesterenko added a comment - 2024-06-04 18:29 Hi Roel ! This is ready to be tested in branch preview-11.6-MDEV-21322 .

Roel Van de Paar added a comment - 2024-06-04 20:04 - edited

https://github.com/MariaDB/server/commit/11d7bd2ed3322279e14068bdf8a913626900f4ac
https://github.com/MariaDB/server/pull/1427 >
https://github.com/MariaDB/server/pull/2374 >
https://github.com/MariaDB/server/pull/3288
https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/62YVCBGSC23PDTPHSEBU4LH74LZVGJD7/

Roel Van de Paar added a comment - 2024-06-04 20:04 - edited https://github.com/MariaDB/server/commit/11d7bd2ed3322279e14068bdf8a913626900f4ac https://github.com/MariaDB/server/pull/1427 > https://github.com/MariaDB/server/pull/2374 > https://github.com/MariaDB/server/pull/3288 https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/62YVCBGSC23PDTPHSEBU4LH74LZVGJD7/

Brandon Nesterenko added a comment - 2024-06-07 14:56

Pulling from 11.6, as there is some unnecessary overhead in the implementation, and we can instead re-consider a better design (or decide if we even want this feature at all).

Brandon Nesterenko added a comment - 2024-06-07 14:56 Pulling from 11.6, as there is some unnecessary overhead in the implementation, and we can instead re-consider a better design (or decide if we even want this feature at all).

Jack added a comment - 2024-06-07 15:12

Amazing

5 years later, the patchset passed through a myriad of hands, and now we think that, maybe, the feature is not required.

In the mean time, every body with a master/slave setup must keep a large dataset of binlog, most of which are useless.
Because there is no way to know when a binlog has been applied to the client.
So we must keep something like a day or two, and bet that the apply will be done (even if the slave went into maintenance somehow).
If the bet is right, then we only consume lots of storage, for nothing
If the bet is wrong, then replication fails, the slave is no longer usable, and we must recreate it from scratch (which is pain)

Postgres has its own issues, but damn, replication-stuff is so much easier and cheaper there

Jack added a comment - 2024-06-07 15:12 Amazing 5 years later, the patchset passed through a myriad of hands, and now we think that, maybe, the feature is not required. In the mean time, every body with a master/slave setup must keep a large dataset of binlog, most of which are useless. Because there is no way to know when a binlog has been applied to the client. So we must keep something like a day or two, and bet that the apply will be done (even if the slave went into maintenance somehow). If the bet is right, then we only consume lots of storage, for nothing If the bet is wrong, then replication fails, the slave is no longer usable, and we must recreate it from scratch (which is pain) Postgres has its own issues, but damn, replication-stuff is so much easier and cheaper there

People

Assignee:: Brandon Nesterenko

Reporter:: Anel Husakovic

Votes:: 1 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 2019-12-16 06:48

Updated:: 2025-03-21 18:51

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server