[MDEV-28609] refine gtid-strict-mode to ignore same server-id gtid from the past on semisync slave Created: 2022-05-18  Updated: 2023-11-27  Resolved: 2022-07-26

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2

Type: Bug Priority: Blocker
Reporter: Andrei Elkin Assignee: Andrei Elkin
Resolution: Fixed Votes: 2
Labels: gtid-strict-mode, semisync

Issue Links:
Blocks
Relates
relates to MDEV-28292 Allow both --replicate-same-server-id... Closed

 Description   

To provide semisync master crash-recovery (MDEV-21117) the same server-id transactions
were made to accept for execution on the semisync slave when the strict gtid
mode (see MDEV-27760).
That however caused out-of-order error on a master's transaction
server of the circular setup.
The error was fair in the sense of the gtid strict mode rule as indeed
under the condition of the circular setup the replicated transaction
already exists in the local binlog.

This gets fixed to ignore on the gtid strict mode semisync
slave those gtids that exist in the slave's binlog that effectively restores
the default same-server-id ignore policy on the semisync slave.
At the same time the fixes complies with MDEV-21117 semisync slave recovery
to accept the same server-id transactions that do not exist in local binlog.



 Comments   
Comment by Faisal Saeed (Inactive) [ 2022-05-20 ]

Elkin i have updated this with the respective CS number.

Comment by Andrei Elkin [ 2022-06-30 ]

Howdy, Brandon!

Could you please assess a solution you can find in bb-10.6-andrei.
Initially I though to extend gtid-ignore-duplicates but a nicer idea to extend
the gtid strict by Sergei has prevailed .

Cheers,

Andrei

Comment by Brandon Nesterenko [ 2022-06-30 ]

Hi Andrei!

The patch looks good, and I agree that extending the strict_mode
generally is more intuitive. I left a few notes on 0b2eeb3.
for your consideration.

Comment by Andrei Elkin [ 2022-07-05 ]

https://github.com/MariaDB/server/pull/2181

Comment by Andrei Elkin [ 2022-07-06 ]

The test plan for MDEV-28609 refine gtid-strict-mode to ignore same
server-id gtid from the past on semisync slave
-------------------------------------------------------------------

PREAMBLE

MDEV-21117/MDEV-27760 relaxed own server-id gtid acceptance (was not to accept by
default) when the gtid strict mode (ON) and the slave is semisync.
This ensued MDEV-28609 that complains the circular semisync setup
ceased not work as own gtids that make round-trip hit the gtid
strict mode Out-Of-(binlog)-Order (OOO) error.

The MDEV-28609 fixes further appease the gtid strict mode for
own gtids. For those that are already in the semisync slave's binlog
there won't be the OOO anymore. It is safe to simply ignore such
gtids.
The ignoring still updates the slave gtid state (gtid_slave_pos).

SUGGESTED TESTING

takes rpl.rpl_circular_semi_sync as a template.
Set up two servers circular semisync strict gtid mode configuration
and execute two the following types of load (L1,L2,L3):

L1. /* practical */ each server originates gtid with its own
domain-id. Consider a few (up to 10) domains per server.

There must be no clashes between different domain-id transactions
wrt their data too.

L2. /* lesser practical */ fake server_id as the template does in the
section B.

L3. /* "not so practical" */ one common domain and server generates
transactions in coordination, like in the template test

```--connection server_1
INSERT INTO t1(a) VALUES (1);
--source include/save_master_gtid.inc

--connection server_2
---sync_with_master
--source include/sync_with_master_gtid.inc
INSERT INTO t1(a) VALUES (2);```

Having the servers under load perform

PERTURBATIONs

P1. randomly stop and restart in a random time interval the slave services while keeping up
the servers' client load
P2. randomly shutdow-restart / crash-semisync-slave-recover the server and restart its slave service

to observe eventual consistent server states (gtid slave, binlog and the data).

Comment by Andrei Elkin [ 2022-07-18 ]

Faisal, slave salve! Let me paste the fixes' summary from the commit:

[the issue] is fixed by the commit to ignore on the gtid strict mode semisync
    slave those gtids that exist in the slave's binlog effectively restore
    the default same-server-id ignore policy and
    at the same time the fixes continue to provide
   MDEV-21117/MDEV-27760 semisync slave recovery (that is to accept the same
    server-id transactions that do notexist in local binlog).

Informally, the user only needs `gtid_strict_mode = ON` at least on slave and the semisync slave knows now what its server-id transaction it needs to ignore and what to accept (that's to the MDEV-21117 recovery) w/o any OOO error in the following.

Comment by Angelique Sklavounos (Inactive) [ 2022-07-22 ]

Testing of L1, L2, L3 with P1, P2 looked good. Okay to push.

Generated at Thu Feb 08 10:02:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.