[MDEV-30821] MariaDB replication works via MASTER_LOG_FILE, almost instantly fails using MASTER_USE_GTID Created: 2023-03-09  Updated: 2023-04-10  Resolved: 2023-04-10

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.4.28
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Artem S. Tashkinov Assignee: Angelique Sklavounos (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Ubuntu 20.04 LTS, x86_64 host, 128GB of RAM, 1TB NVMe RAID1 SSD


Attachments: File mariadb.cnf     File my.cnf    

 Description   

We have a weird issue with MariaDB version 10.4.28+maria~ubu2004 (the official binary release for Ubuntu 20.04 LTS).

We dumped the master data using mariabackup (we have a single database), and restored it on a replica server exactly how it's described here.

Prior to restoring, the replica did not contain anything other than built-in MySQL/MariaDB databases and tables (information_schema, mysql and performance_schema) - all of this was populated automatically by the APT installation script.

The issue is we are unable to start replication using GTID but it works perfectly when using binlog, so:

{{
CHANGE MASTER TO
MASTER_HOST="dbserver",
MASTER_USER="user",
MASTER_PASSWORD="password",
MASTER_USE_GTID=slave_pos;
START SLAVE;
}}
Results in a failure almost right away, e.g.

Last_Errno: 1452 Last_Error: Error 'Cannot add or update a child row: a foreign key constraint fails

Or something more sinister like

Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Update_rows_v1 event on table database.tableX; Can't find record in 'tableX', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mariadb-bin.000009, end_log_pos 748324751

Whenever we setup replication using binlog, e.g.

{{
CHANGE MASTER TO
MASTER_HOST="dbserver",
MASTER_USER="user",
MASTER_PASSWORD="password",
MASTER_LOG_FILE='mariadb-bin.000001',
MASTER_LOG_POS=12345;
START SLAVE;
}}
It's working perfectly. Any help trying to understand or fix GTID replication would be highly appreciated.



 Comments   
Comment by Angelique Sklavounos (Inactive) [ 2023-03-10 ]

Hi birdie,

Just to verify - before you sent CHANGE MASTER...MASTER_USE_GTID=slave_pos, did you send SET GLOBAL gtid_slave_pos=<slave_pos>, where <slave_pos> would correspond to the MASTER_LOG_POS=12345? This can be found in the xtrabackup_binlog_info file.

Comment by Artem S. Tashkinov [ 2023-03-11 ]

We are normally enabling replication using backups from the master server which contain this:

{{
SET GLOBAL gtid_slave_pos='0-105-11108664494';
CHANGE MASTER '' TO MASTER_USE_GTID=slave_pos;
}}

As far as I understand, when we do `mysql < backup.sql` the GTID cursor is set automatically to the right position.

(What happened to formatting on this Jira? I cannot enable any formatting options).

Comment by Angelique Sklavounos (Inactive) [ 2023-03-13 ]

Thanks, birdie. I'll try to reproduce this. Could you please attached the my.cnf files?

As for the Jira formatting, I think there was a recent upgrade, so that might be the cause. You can do:
{code}
CODE
CODE
{code}

to get

CODE
CODE

Comment by Artem S. Tashkinov [ 2023-03-13 ]

mariadb.cnf my.cnf

Comment by Angelique Sklavounos (Inactive) [ 2023-03-17 ]

Hi birdie,

Could you please:

  1. Upload the xtrabackup_info file?
  2. Grep the mysqlbinlog output of the backup binlog file for the GTID slave_pos to show the corresponding log position?

Thank you.

Comment by Artem S. Tashkinov [ 2023-04-07 ]

Let's close this bug report for a while because the TGID replication started to work a few days later.

We have no idea how and why it happened but now we cannot reproduce it.

Comment by Angelique Sklavounos (Inactive) [ 2023-04-10 ]

Thanks for the update, birdie. Please reopen if you see the issue again.

Generated at Thu Feb 08 10:19:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.