[MDEV-15393] gtid_slave_pos duplicate key errors after restore Created: 2018-02-23  Updated: 2019-04-09

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2.12
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Andrei Elkin
Resolution: Unresolved Votes: 3
Labels: backup, gtid, slave

Issue Links:
Relates
relates to MDEV-18828 Error during XID COMMIT: failed to up... Closed

 Description   

When restoring a master backup where, for whatever reason, gtid_slave_pos contains entries (master<->master setup, intermediate slave, former slave promoted to master), and using classic master_log_file/pos to set up replication, a newly set up slave will run for a while, counting up its on "sub_id" from one, until reaching the masters "sub_id" values still present in the restored gtid_slave_pos. At that point replication will stop with a duplicate key error.

How to reproduce:

  • set up a master, insert a "fake" entry into gtid_slave_pos:

  INSERT INTO mysql.gtid_slave_pos VALUES(0,5,1,23);

  • take a master backup

  mysqldump --all-databases --single-transaction --master-data > dump.sql

  • set up slave, set up replication basics:

  CHANGE MASTER TO MASTER_HOST='...'
                , MASTER_USER='...'
                , MASTER_PASSWORD='...';

  • restore dump on slave
  • start slave, check slave status to see that things are running, check mysql.gtid_slave_pos contents. Note the first line inserted by the slave, and the 2nd one originating from the master still being there:

+-----------+--------+-----------+--------+
| domain_id | sub_id | server_id | seq_no |
+-----------+--------+-----------+--------+
|         0 |      1 |        2 |      5 |
|         0 |      5 |         1 |     23 |
+-----------+--------+-----------+--------+

  • perform some DDL or DML statements on the master
  • check slave status:

                   Last_Error: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-5' for key 'PRIMARY'

The gtid_slave_pos table is truncated when setting the gtid_slave_pos variable with SET GLOBAL, but is left as is on

CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=...;

Proposed fixes:

TRUNCATE the table on explicit CHANGE MASTER TO MASTER_LOG_FILE, and move the CHANGE MASTER TO command from beginning to end of mysqldump output



 Comments   
Comment by Sandy [ 2019-02-22 ]

same error, but in a different situation.
we use version 10.1.22, when this error happened, we run stop slave and start slave after a while. the error will be healed. But we don't know how to reproduce this.

This is the error log .
2019-02-21 12:50:18 139769347844864 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-194031977'
for key 'PRIMARY', Gtid 0-16253440-388064147, Internal MariaDB error code: 1942
2019-02-21 12:50:18 139769347844864 [ERROR] Slave (additional info): Duplicate entry '0-194031977' for key 'PRIMARY' Error_code: 1062
2019-02-21 12:50:18 139769347844864 [Warning] Slave: Duplicate entry '0-194031977' for key 'PRIMARY' Error_code: 1062
2019-02-21 12:50:18 139769347844864 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stop
ped at log 'mysql-bin.000085' position 202964878
2019-02-21 12:50:18 139769349663488 [Note] Error reading relay log event: slave SQL thread was killed
2019-02-21 12:50:18 139769349663488 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.000085' at position 202964878
2019-02-22 18:02:37 139769349360384 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000085', position 461759256
2019-02-22 18:04:13 139769349663488 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000085' at position 202964878, relay log '/mnt/storage00
/mysql/3440/bin_log/mysql-relay-log.000106' position: 202965166
2019-02-22 18:04:13 139769347844864 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-194031978'
for key 'PRIMARY', Gtid 0-16253440-388064147, Internal MariaDB error code: 1942
2019-02-22 18:04:13 139769347844864 [ERROR] Slave (additional info): Duplicate entry '0-194031978' for key 'PRIMARY' Error_code: 1062
2019-02-22 18:04:13 139769347844864 [Warning] Slave: Duplicate entry '0-194031978' for key 'PRIMARY' Error_code: 1062
2019-02-22 18:04:13 139769347844864 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stop
ped at log 'mysql-bin.000085' position 202964878
2019-02-22 18:04:13 139769348754176 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2019-02-22 18:04:13 139769348754176 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2019-02-22 18:04:13 139769348754176 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stop
ped at log 'mysql-bin.000085' position 202964878
2019-02-22 18:04:13 139769347238656 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2019-02-22 18:04:13 139769347238656 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2019-02-22 18:04:13 139769347238656 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stop
ped at log 'mysql-bin.000085' position 202964878
2019-02-22 18:04:13 139769349663488 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.000085' at position 202964878
2019-02-22 18:04:13 139769349360384 [Note] Slave I/O thread: connected to master 'repl@172.22.16.39:3440',replication started in log 'mysql-bin.000085' at position 4617
59256
2019-02-22 18:04:19 139769349360384 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000085', position 462118255
2019-02-22 18:04:26 139769347844864 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000085' at position 202964878, relay log '/3440/bin_log/mysql-relay-log.000106' position: 202965166
2019-02-22 18:04:26 139769349360384 [Note] Slave I/O thread: connected to master 'repl@IP:3440',replication started in log 'mysql-bin.000085' at position 4621
18255

Comment by Andrei Elkin [ 2019-04-08 ]

Sandy It's difficult to say what happened to your slave. Make sure to upload its configuration, global var:s and status reports as well. Also to your report we've analyzed MDEV-19157 which has some similarity with your error log snippet.

Comment by Sandy [ 2019-04-09 ]

Andrei ,Thank you.
MDEV-19157 was committed by my colleague ^ _ ^

Generated at Thu Feb 08 08:20:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.