[MDEV-32828] Partial GTID Replication Restoration Created: 2023-11-17  Updated: 2023-11-17

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: ASHWANI SINGH Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Linux



 Description   

Hi Team,
We have setup the GTID based Replication between 4 DB Servers and then after some time we observed that my replication breaked connections breaked are as follow:

  1. DB 1 --> DB 3
  2. DB 2 --> DB 3
  3. DB 1 --> DB 4
  4. DB 2 --> DB 4
    and vice versa.

We can see the SQL error as mentioned below:

  1. Last_SQL_Errno: 1032
    Last_SQL_Error: Could not execute Update_rows_v1 event on table aaadb.activesessions; Can't find record in 'activesessions', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log db2-binary.005151, end_log_pos 4978
  1. Last_SQL_Errno: 1032
    Last_SQL_Error: Could not execute Update_rows_v1 event on table aaadb.activesessions; Can't find record in 'activesessions', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log db1-binary.004968, end_log_pos 141985000

Please provide the resolution or steps to restore these nodes only instead of setting up GTID Replication from scratch.



 Comments   
Comment by Kristian Nielsen [ 2023-11-17 ]

This is a multi-master setup. For this to work, it is required to:
1. Configure each distinct master with a different domain_id. See https://mariadb.com/kb/en/gtid/#use-with-multi-source-replication-and-other-multi-primary-setups
2. Your use of the term "vice versa" suggests that maybe you have multiple replication paths to a slave. In this case you need to enable --gtid-ignore-diplicates to avoid duplicate apply of events.

The errors mean that a row was updated on the master but the row did not exist on the slave when replication tried to apply the update. This can happen for a number of different reasons, including incorrect configuration of multi-source replication; conflicting updates done on DB1 and DB2; etc. Asynchronous replication with multi-master is a complex setup to manage and requires a good understanding of how replication and GTID works. It's unfortunately not really possible to provide a simple step-by-step how to solve such problems in the general case.

Hope this helps,

- Kristian.

Generated at Thu Feb 08 10:34:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.