[MDEV-28134] Error_code: handler error HA_ERR_FOUND_DUPP_KEY with Multi DC Galera Cluster | GTID Replication Created: 2022-03-21  Updated: 2022-05-16  Resolved: 2022-05-16

Status: Closed
Project: MariaDB Server
Component/s: Galera, Replication
Affects Version/s: 10.4.23, 10.2
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Pramod Mahto Assignee: Pramod Mahto
Resolution: Incomplete Votes: 0
Labels: None

Attachments: Text File Under_load_GTID_Replication_Fail_With_Duplicate_Entry .txt     File node_a.cnf     File node_b.cnf     File node_c.cnf     File node_d.cnf     File node_e.cnf     File node_f.cnf     File node_g.cnf    
Issue Links:
Blocks
blocks MDEV-28015 Mariabackup | GTID value is missing, ... Closed

 Description   

 
DC1				DC2		DC3
				
A-B-C----GTID Replication---->D-E-F----GTID Replication---->G
 
[DC1] A,B,C   3 Node Galera cluster
[DC2] D,E,F   3 Node Galera Cluster
[DC3] G        1 Node Galera Cluster
 
DC1--DC2  C-->D GTID Replication 
DC2--DC3  F-->G GTID Replication 

Followed the basic rule for Enabling WSREP GTID Mode https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-galera-cluster/#enabling-wsrep-gtid-mode.

Issue starts from here taking mysqldump backup on Node F while Node A under write load and restoring the same on Node G , lead to duplicate entry.

Node F

[root@vmc_f tmp]#  mysqldump -u root -proot --master-data=2 --all-databases>/tmp/fullbackup.sql
[root@vmc_f tmp]# head -30 fullbackup.sql                                                                                  
-- MySQL dump 10.16  Distrib 10.2.32-MariaDB, for Linux (x86_64)
--
-- Host: localhost    Database:
-- ------------------------------------------------------
-- Server version       10.2.32-MariaDB-log
 
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
 
--
-- Position to start replication or point-in-time recovery from
--
 
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql_bin_f.000001', MASTER_LOG_POS=330876;
 
--
-- GTID to start replication from
--
 
-- SET GLOBAL gtid_slave_pos='100-1-286,200-2-117';
 
--
 
 
[root@vmc_f tmp]# scp fullbackup.sql root@192.168.47.107:/tmp
root@192.168.47.107's password:
fullbackup.sql                                       100%   65KB   7.3MB/s   00:00

Node G

 
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [(none)]> reset slave;
Query OK, 0 rows affected (0.00 sec)
 
[root@vmc_g ~]# ls -ltrh /tmp/fullbackup.sql
-rw-r--r-- 1 root root 65K Mar 21 00:37 /tmp/fullbackup.sql
 
[root@vmc_g ~]# mysql -u root -proot </tmp/fullbackup.sql
 
MariaDB [(none)]> select count(*) from sbtest.sbtest1;
+----------+
| count(*) |
+----------+
|      140 |
+----------+
1 row in set (0.00 sec)
 
 
MariaDB [(none)]> select @@gtid_binlog_pos, @@gtid_binlog_state,@@gtid_current_pos,@@gtid_slave_pos;
+--------------------------------+--------------------------------+--------------------------------+----------------------+
| @@gtid_binlog_pos              | @@gtid_binlog_state            | @@gtid_current_pos             | @@gtid_slave_pos     |
+--------------------------------+--------------------------------+--------------------------------+----------------------+
| 100-1-1223,200-2-117,300-3-242 | 100-1-1223,200-2-117,300-3-242 | 100-1-1223,200-2-117,300-3-242 | 100-1-1223,200-2-117 |
+--------------------------------+--------------------------------+--------------------------------+----------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> SET GLOBAL gtid_slave_pos='100-1-286,200-2-117';
Query OK, 0 rows affected, 1 warning (0.00 sec)
 
MariaDB [(none)]> show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                                                                                                          |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1948 | Specified value for @@gtid_slave_pos contains no value for replication domain 300. This conflicts with the binary log which contains GTID 300-3-242. If MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override the new value of @@gtid_slave_pos |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
 
 
MariaDB [(none)]> select @@gtid_binlog_pos, @@gtid_binlog_state,@@gtid_current_pos,@@gtid_slave_pos;
+--------------------------------+--------------------------------+-------------------------------+---------------------+
| @@gtid_binlog_pos              | @@gtid_binlog_state            | @@gtid_current_pos            | @@gtid_slave_pos    |
+--------------------------------+--------------------------------+-------------------------------+---------------------+
| 100-1-1223,200-2-117,300-3-242 | 100-1-1223,200-2-117,300-3-242 | 100-1-286,200-2-117,300-3-242 | 100-1-286,200-2-117 |
+--------------------------------+--------------------------------+-------------------------------+---------------------+
1 row in set (0.00 sec)
 
 
MariaDB [(none)]> change master to master_host='192.168.47.106', master_user='replica', master_password='replica', master_use_gtid=slave_pos;
Query OK, 0 rows affected (0.02 sec)
 
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.47.106
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql_bin_f.000001
          Read_Master_Log_Pos: 1587649
               Relay_Log_File: vmc_g-relay-bin.000002
                Relay_Log_Pos: 817
        Relay_Master_Log_File: mysql_bin_f.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows_v1 event on table sbtest.sbtest1; Duplicate entry '5085' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql_bin_f.000001, end_log_pos 43693
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 42853
              Relay_Log_Space: 1406302
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows_v1 event on table sbtest.sbtest1; Duplicate entry '5085' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql_bin_f.000001, end_log_pos 43693
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 2
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 100-1-1223,200-2-117
      Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
                Parallel_Mode: conservative
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
1 row in set (0.00 sec)
 
 
MariaDB [(none)]> select * from sbtest.sbtest1 where id=5085;
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k    | c                                                                                                                       | pad                                                         |
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 5085 | 5037 | 39476658110-32559687743-47203182949-51304521021-75485891409-50717849795-47190185045-16047806543-87533311936-70471326632 | 52490813876-70767371353-94467376165-15821586959-80410098309 |
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> select count(*) from sbtest.sbtest1;
+----------+
| count(*) |
+----------+
|      140 |
+----------+
1 row in set (0.00 sec)



 Comments   
Comment by Jan Lindström (Inactive) [ 2022-04-08 ]

ramesh Can you test 10.4

Comment by Ramesh Sivaraman [ 2022-04-08 ]

jplindst Reproduced the issue on 10.4.23 using given test case.

node1:root@localhost> show slave status\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.20.2.34
                   Master_User: repl
                   Master_Port: 22320
                 Connect_Retry: 60
               Master_Log_File: mysql_bin.000001
           Read_Master_Log_Pos: 959953
                Relay_Log_File: ax3-galera3-relay-bin.000002
                 Relay_Log_Pos: 810
         Relay_Master_Log_File: mysql_bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: No
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 1047
                    Last_Error: Node has dropped from cluster
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 46918
               Relay_Log_Space: 788265
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: NULL
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 1047
                Last_SQL_Error: Node has dropped from cluster
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 2
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 100-1-777,200-2-132
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: 
              Slave_DDL_Groups: 3
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 568
1 row in set (0.000 sec)
 
node1:root@localhost> \q
 
 
node1:root@localhost> select * from sbtest1 where id=5009;
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k    | c                                                                                                                       | pad                                                         |
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 5009 | 5011 | 21530626160-56347531229-46271551052-58215791514-32850707675-84078713984-55253540950-44570699456-37146602098-67863270568 | 11560770028-32012765973-27326258161-13215537521-02612531434 |
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.000 sec)
 
node1:root@localhost> select count(1) from sbtest1;
+----------+
| count(1) |
+----------+
|      126 |
+----------+
1 row in set (0.000 sec)
 
node1:root@localhost> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 10.4.23-MariaDB-log |
+---------------------+
1 row in set (0.000 sec)
 
node1:root@localhost> 
 
Error Info
 
2022-04-08 11:42:13 16 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='172.20.2.34', master_port='22320', master_log_file='', master_log_pos='4'. New state master_host='172.20.2.34', master_port='22320', master_log_file='', master_log_pos='4'.
2022-04-08 11:42:13 16 [Note] Previous Using_Gtid=No. New Using_Gtid=Slave_Pos
2022-04-08 11:42:17 23 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@172.20.2.34:22320' in log '' at position 4
2022-04-08 11:42:17 24 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './ax3-galera3-relay-bin.000001' position: 4; GTID position '100-1-270,200-2-132'
2022-04-08 11:42:17 23 [Note] Slave I/O thread: connected to master 'repl@172.20.2.34:22320',replication starts at GTID position '100-1-270,200-2-132'
2022-04-08 11:42:17 24 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.sbtest1; Duplicate entry '5009' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql_bin.000001, end_log_pos 47564, Internal MariaDB error code: 1062
2022-04-08 11:42:17 24 [ERROR] Slave SQL: Node has dropped from cluster, Internal MariaDB error code: 1047
2022-04-08 11:42:17 24 [Note] Slave SQL thread exiting, replication stopped in log 'mysql_bin.000001' at position 46918; GTID position '100-1-270,200-2-132'
2022-04-08 11:42:17 24 [Note] master was 172.20.2.34:22320
 
 
 
Node A
 
node1:root@localhost> select * from sbtest1 where id=5009;
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k    | c                                                                                                                       | pad                                                         |
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 5009 | 5026 | 51321880912-92604745426-70132595668-50247508417-03306661890-51432289092-75619253222-29283573576-38537233912-64752942631 | 18679717763-26375392697-59963602089-17044746735-03418792121 |
+------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.000 sec)
 
node1:root@localhost> 

Comment by Jan Lindström (Inactive) [ 2022-04-11 ]

ramesh Thanks, can you test with 10.5 also because it should have MDEV-20720 fixed.

Comment by Ramesh Sivaraman [ 2022-04-11 ]

jplindst Could not reproduce the issue on 10.5.16.

Comment by Jan Lindström (Inactive) [ 2022-04-11 ]

pramod.mahto@mariadb.com Based on https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-galera-cluster/ Until MariaDB 10.5.1, there were known cases where GTIDs could become inconsistent across the cluster nodes. So if customer requires GTIDs to be consistent, use 10.5. To have consistent GTIDs you need MDEV-20720 to be fixed. Note that this is fixed on 10.4ES also.

Comment by Max Mether [ 2022-04-11 ]

ccalender

They way I read Jan's comment is that the ability to have consistent GTID in Galera is a feature that was not added until 10.5 (or 10.4 ES). Hence the fact that they are not consistent prior to 10.5 is not a bug but a feature (or lack of a feature). A bit like it would be to ask for Atomic DDL prior to 10.6, that feature just didn't exist until 10.6

Generated at Thu Feb 08 09:58:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.