[MDEV-33424] when both rpl_semi_sync_MASTER,SLAVE_enabled set the server should recover as master - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
Fix Version/s: N/A
Component/s: Replication
Labels:
None

Description

When at recovery both rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled variables are set ON
the server recovers as a semisync slave to conduct binlog truncation according to ~~MDEV-21117~~.

While which of the roles the user means with the settings is unclear, and considering that switchover must be of lesser demand compare to a "normal" master crash-recovery, a presumed intent is better off be for MASTER.

That is when both variables are ON a post-crash restarting server would execute the normal recovery.

Attachments

Issue Links

relates to

MDEV-21117 refine the server binlog-based recovery for semisync

Closed

Activity

Ascending order - Click to sort in descending order

Andrei Elkin added a comment - 2024-02-08 15:07 - edited

As a background, the semisync slave recovery mode of ~~MDEV-21117~~ was introduced for providing failover (to a new master) such that the old master is repurposed to slave to the new one. Historical facts as attributing to this method as well as its practical side are discussed in Jean-François Gagné blog. The old master is required to have rpl_semi_sync_slave_enabled = ON.
In essence the old master rolls back transactions in doubt which ensures it can't be ahead (e.g in the GTID terms) of the new master.

As of current a possible (and conflicting in "normal" master-slave two servers setup) rpl_semi_sync_MASTER_enabled = ON is ignored
for the recovery mode computation.
This ticket offers to avoid the server to take any dubious decision. Only when the server is configured unambiguously as semisync slave
it will pass through the semisync recovery.
Otherwise recovery is normal.

Andrei Elkin added a comment - 2024-02-08 15:07 - edited As a background, the semisync slave recovery mode of MDEV-21117 was introduced for providing failover (to a new master) such that the old master is repurposed to slave to the new one. Historical facts as attributing to this method as well as its practical side are discussed in Jean-François Gagné blog . The old master is required to have rpl_semi_sync_slave_enabled = ON . In essence the old master rolls back transactions in doubt which ensures it can't be ahead (e.g in the GTID terms) of the new master. As of current a possible (and conflicting in "normal" master-slave two servers setup) rpl_semi_sync_MASTER_enabled = ON is ignored for the recovery mode computation. This ticket offers to avoid the server to take any dubious decision. Only when the server is configured unambiguously as semisync slave it will pass through the semisync recovery. Otherwise recovery is normal.

Michael Widenius added a comment - 2024-02-11 12:52

This is solution is a non-go because of the following reasons:

This will be very hard to document and understand.
There is no relationship between the two variables and there should not be. One used in the case the machine is a master, the other is if the machine is a slave. In many master-slave environments a machine can be a master, a slave or both. It should be safe to have both ALWAYS on. This should even be the default for anyone wanting to have semi-sync always on for all machines that are in replication setup.
How recovery is done should not depend on these variables, but on some other recovery related variable that should be easy to document and understand.
The real problem we are having in ~~MDEV-21117~~ is related to how a slave continues when a transaction GTID it has seen part of does not exist anymore. We have to fix this case anyway!

Michael Widenius added a comment - 2024-02-11 12:52 This is solution is a non-go because of the following reasons: This will be very hard to document and understand. There is no relationship between the two variables and there should not be. One used in the case the machine is a master, the other is if the machine is a slave. In many master-slave environments a machine can be a master, a slave or both. It should be safe to have both ALWAYS on. This should even be the default for anyone wanting to have semi-sync always on for all machines that are in replication setup. How recovery is done should not depend on these variables, but on some other recovery related variable that should be easy to document and understand. The real problem we are having in MDEV-21117 is related to how a slave continues when a transaction GTID it has seen part of does not exist anymore. We have to fix this case anyway!

People

Assignee:: Brandon Nesterenko

Reporter:: Andrei Elkin

Votes:: 2 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2024-02-07 17:15

Updated:: 2024-11-15 16:03

Resolved:: 2024-02-11 12:52

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server