Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28134

Error_code: handler error HA_ERR_FOUND_DUPP_KEY with Multi DC Galera Cluster | GTID Replication

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.4.23, 10.2(EOL)
    • N/A
    • Galera, Replication
    • None

    Description

       
      DC1				DC2		DC3
      				
      A-B-C----GTID Replication---->D-E-F----GTID Replication---->G
       
      [DC1] A,B,C   3 Node Galera cluster
      [DC2] D,E,F   3 Node Galera Cluster
      [DC3] G        1 Node Galera Cluster
       
      DC1--DC2  C-->D GTID Replication 
      DC2--DC3  F-->G GTID Replication 
      
      

      Followed the basic rule for Enabling WSREP GTID Mode https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-galera-cluster/#enabling-wsrep-gtid-mode.

      Issue starts from here taking mysqldump backup on Node F while Node A under write load and restoring the same on Node G , lead to duplicate entry.

      Node F

      [root@vmc_f tmp]#  mysqldump -u root -proot --master-data=2 --all-databases>/tmp/fullbackup.sql
      [root@vmc_f tmp]# head -30 fullbackup.sql                                                                                  
      -- MySQL dump 10.16  Distrib 10.2.32-MariaDB, for Linux (x86_64)
      --
      -- Host: localhost    Database:
      -- ------------------------------------------------------
      -- Server version       10.2.32-MariaDB-log
       
      /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
      /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
      /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
      /*!40101 SET NAMES utf8 */;
      /*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
      /*!40103 SET TIME_ZONE='+00:00' */;
      /*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
      /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
      /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
      /*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
       
      --
      -- Position to start replication or point-in-time recovery from
      --
       
      -- CHANGE MASTER TO MASTER_LOG_FILE='mysql_bin_f.000001', MASTER_LOG_POS=330876;
       
      --
      -- GTID to start replication from
      --
       
      -- SET GLOBAL gtid_slave_pos='100-1-286,200-2-117';
       
      --
       
       
      [root@vmc_f tmp]# scp fullbackup.sql root@192.168.47.107:/tmp
      root@192.168.47.107's password:
      fullbackup.sql                                       100%   65KB   7.3MB/s   00:00
      
      

      Node G

       
      MariaDB [(none)]> stop slave;
      Query OK, 0 rows affected (0.00 sec)
       
      MariaDB [(none)]> reset slave;
      Query OK, 0 rows affected (0.00 sec)
       
      [root@vmc_g ~]# ls -ltrh /tmp/fullbackup.sql
      -rw-r--r-- 1 root root 65K Mar 21 00:37 /tmp/fullbackup.sql
       
      [root@vmc_g ~]# mysql -u root -proot </tmp/fullbackup.sql
       
      MariaDB [(none)]> select count(*) from sbtest.sbtest1;
      +----------+
      | count(*) |
      +----------+
      |      140 |
      +----------+
      1 row in set (0.00 sec)
       
       
      MariaDB [(none)]> select @@gtid_binlog_pos, @@gtid_binlog_state,@@gtid_current_pos,@@gtid_slave_pos;
      +--------------------------------+--------------------------------+--------------------------------+----------------------+
      | @@gtid_binlog_pos              | @@gtid_binlog_state            | @@gtid_current_pos             | @@gtid_slave_pos     |
      +--------------------------------+--------------------------------+--------------------------------+----------------------+
      | 100-1-1223,200-2-117,300-3-242 | 100-1-1223,200-2-117,300-3-242 | 100-1-1223,200-2-117,300-3-242 | 100-1-1223,200-2-117 |
      +--------------------------------+--------------------------------+--------------------------------+----------------------+
      1 row in set (0.00 sec)
       
      MariaDB [(none)]> SET GLOBAL gtid_slave_pos='100-1-286,200-2-117';
      Query OK, 0 rows affected, 1 warning (0.00 sec)
       
      MariaDB [(none)]> show warnings;
      +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Level   | Code | Message                                                                                                                                                                                                                                                          |
      +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Warning | 1948 | Specified value for @@gtid_slave_pos contains no value for replication domain 300. This conflicts with the binary log which contains GTID 300-3-242. If MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override the new value of @@gtid_slave_pos |
      +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      1 row in set (0.00 sec)
       
       
      MariaDB [(none)]> select @@gtid_binlog_pos, @@gtid_binlog_state,@@gtid_current_pos,@@gtid_slave_pos;
      +--------------------------------+--------------------------------+-------------------------------+---------------------+
      | @@gtid_binlog_pos              | @@gtid_binlog_state            | @@gtid_current_pos            | @@gtid_slave_pos    |
      +--------------------------------+--------------------------------+-------------------------------+---------------------+
      | 100-1-1223,200-2-117,300-3-242 | 100-1-1223,200-2-117,300-3-242 | 100-1-286,200-2-117,300-3-242 | 100-1-286,200-2-117 |
      +--------------------------------+--------------------------------+-------------------------------+---------------------+
      1 row in set (0.00 sec)
       
       
      MariaDB [(none)]> change master to master_host='192.168.47.106', master_user='replica', master_password='replica', master_use_gtid=slave_pos;
      Query OK, 0 rows affected (0.02 sec)
       
      MariaDB [(none)]> start slave;
      Query OK, 0 rows affected (0.01 sec)
       
      MariaDB [(none)]> show slave status\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 192.168.47.106
                        Master_User: replica
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mysql_bin_f.000001
                Read_Master_Log_Pos: 1587649
                     Relay_Log_File: vmc_g-relay-bin.000002
                      Relay_Log_Pos: 817
              Relay_Master_Log_File: mysql_bin_f.000001
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB:
                Replicate_Ignore_DB:
                 Replicate_Do_Table:
             Replicate_Ignore_Table:
            Replicate_Wild_Do_Table:
        Replicate_Wild_Ignore_Table:
                         Last_Errno: 1062
                         Last_Error: Could not execute Write_rows_v1 event on table sbtest.sbtest1; Duplicate entry '5085' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql_bin_f.000001, end_log_pos 43693
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 42853
                    Relay_Log_Space: 1406302
                    Until_Condition: None
                     Until_Log_File:
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File:
                 Master_SSL_CA_Path:
                    Master_SSL_Cert:
                  Master_SSL_Cipher:
                     Master_SSL_Key:
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error:
                     Last_SQL_Errno: 1062
                     Last_SQL_Error: Could not execute Write_rows_v1 event on table sbtest.sbtest1; Duplicate entry '5085' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql_bin_f.000001, end_log_pos 43693
        Replicate_Ignore_Server_Ids:
                   Master_Server_Id: 2
                     Master_SSL_Crl:
                 Master_SSL_Crlpath:
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 100-1-1223,200-2-117
            Replicate_Do_Domain_Ids:
        Replicate_Ignore_Domain_Ids:
                      Parallel_Mode: conservative
                          SQL_Delay: 0
                SQL_Remaining_Delay: NULL
            Slave_SQL_Running_State:
      1 row in set (0.00 sec)
       
       
      MariaDB [(none)]> select * from sbtest.sbtest1 where id=5085;
      +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
      | id   | k    | c                                                                                                                       | pad                                                         |
      +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
      | 5085 | 5037 | 39476658110-32559687743-47203182949-51304521021-75485891409-50717849795-47190185045-16047806543-87533311936-70471326632 | 52490813876-70767371353-94467376165-15821586959-80410098309 |
      +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
      1 row in set (0.00 sec)
       
      MariaDB [(none)]> select count(*) from sbtest.sbtest1;
      +----------+
      | count(*) |
      +----------+
      |      140 |
      +----------+
      1 row in set (0.00 sec)
      
      

      Attachments

        1. node_a.cnf
          2 kB
        2. node_b.cnf
          2 kB
        3. node_c.cnf
          2 kB
        4. node_d.cnf
          3 kB
        5. node_e.cnf
          3 kB
        6. node_f.cnf
          2 kB
        7. node_g.cnf
          2 kB
        8. Under_load_GTID_Replication_Fail_With_Duplicate_Entry .txt
          42 kB

        Issue Links

          Activity

            ramesh Can you test 10.4

            jplindst Jan Lindström (Inactive) added a comment - ramesh Can you test 10.4

            jplindst Reproduced the issue on 10.4.23 using given test case.

            node1:root@localhost> show slave status\G
            *************************** 1. row ***************************
                            Slave_IO_State: Waiting for master to send event
                               Master_Host: 172.20.2.34
                               Master_User: repl
                               Master_Port: 22320
                             Connect_Retry: 60
                           Master_Log_File: mysql_bin.000001
                       Read_Master_Log_Pos: 959953
                            Relay_Log_File: ax3-galera3-relay-bin.000002
                             Relay_Log_Pos: 810
                     Relay_Master_Log_File: mysql_bin.000001
                          Slave_IO_Running: Yes
                         Slave_SQL_Running: No
                           Replicate_Do_DB: 
                       Replicate_Ignore_DB: 
                        Replicate_Do_Table: 
                    Replicate_Ignore_Table: 
                   Replicate_Wild_Do_Table: 
               Replicate_Wild_Ignore_Table: 
                                Last_Errno: 1047
                                Last_Error: Node has dropped from cluster
                              Skip_Counter: 0
                       Exec_Master_Log_Pos: 46918
                           Relay_Log_Space: 788265
                           Until_Condition: None
                            Until_Log_File: 
                             Until_Log_Pos: 0
                        Master_SSL_Allowed: No
                        Master_SSL_CA_File: 
                        Master_SSL_CA_Path: 
                           Master_SSL_Cert: 
                         Master_SSL_Cipher: 
                            Master_SSL_Key: 
                     Seconds_Behind_Master: NULL
             Master_SSL_Verify_Server_Cert: No
                             Last_IO_Errno: 0
                             Last_IO_Error: 
                            Last_SQL_Errno: 1047
                            Last_SQL_Error: Node has dropped from cluster
               Replicate_Ignore_Server_Ids: 
                          Master_Server_Id: 2
                            Master_SSL_Crl: 
                        Master_SSL_Crlpath: 
                                Using_Gtid: Slave_Pos
                               Gtid_IO_Pos: 100-1-777,200-2-132
                   Replicate_Do_Domain_Ids: 
               Replicate_Ignore_Domain_Ids: 
                             Parallel_Mode: conservative
                                 SQL_Delay: 0
                       SQL_Remaining_Delay: NULL
                   Slave_SQL_Running_State: 
                          Slave_DDL_Groups: 3
            Slave_Non_Transactional_Groups: 0
                Slave_Transactional_Groups: 568
            1 row in set (0.000 sec)
             
            node1:root@localhost> \q
             
             
            node1:root@localhost> select * from sbtest1 where id=5009;
            +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
            | id   | k    | c                                                                                                                       | pad                                                         |
            +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
            | 5009 | 5011 | 21530626160-56347531229-46271551052-58215791514-32850707675-84078713984-55253540950-44570699456-37146602098-67863270568 | 11560770028-32012765973-27326258161-13215537521-02612531434 |
            +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
            1 row in set (0.000 sec)
             
            node1:root@localhost> select count(1) from sbtest1;
            +----------+
            | count(1) |
            +----------+
            |      126 |
            +----------+
            1 row in set (0.000 sec)
             
            node1:root@localhost> select @@version;
            +---------------------+
            | @@version           |
            +---------------------+
            | 10.4.23-MariaDB-log |
            +---------------------+
            1 row in set (0.000 sec)
             
            node1:root@localhost> 
             
            Error Info
             
            2022-04-08 11:42:13 16 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='172.20.2.34', master_port='22320', master_log_file='', master_log_pos='4'. New state master_host='172.20.2.34', master_port='22320', master_log_file='', master_log_pos='4'.
            2022-04-08 11:42:13 16 [Note] Previous Using_Gtid=No. New Using_Gtid=Slave_Pos
            2022-04-08 11:42:17 23 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@172.20.2.34:22320' in log '' at position 4
            2022-04-08 11:42:17 24 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './ax3-galera3-relay-bin.000001' position: 4; GTID position '100-1-270,200-2-132'
            2022-04-08 11:42:17 23 [Note] Slave I/O thread: connected to master 'repl@172.20.2.34:22320',replication starts at GTID position '100-1-270,200-2-132'
            2022-04-08 11:42:17 24 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.sbtest1; Duplicate entry '5009' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql_bin.000001, end_log_pos 47564, Internal MariaDB error code: 1062
            2022-04-08 11:42:17 24 [ERROR] Slave SQL: Node has dropped from cluster, Internal MariaDB error code: 1047
            2022-04-08 11:42:17 24 [Note] Slave SQL thread exiting, replication stopped in log 'mysql_bin.000001' at position 46918; GTID position '100-1-270,200-2-132'
            2022-04-08 11:42:17 24 [Note] master was 172.20.2.34:22320
             
             
             
            Node A
             
            node1:root@localhost> select * from sbtest1 where id=5009;
            +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
            | id   | k    | c                                                                                                                       | pad                                                         |
            +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
            | 5009 | 5026 | 51321880912-92604745426-70132595668-50247508417-03306661890-51432289092-75619253222-29283573576-38537233912-64752942631 | 18679717763-26375392697-59963602089-17044746735-03418792121 |
            +------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
            1 row in set (0.000 sec)
             
            node1:root@localhost> 
            
            

            ramesh Ramesh Sivaraman added a comment - jplindst Reproduced the issue on 10.4.23 using given test case. node1:root@localhost> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.20.2.34 Master_User: repl Master_Port: 22320 Connect_Retry: 60 Master_Log_File: mysql_bin.000001 Read_Master_Log_Pos: 959953 Relay_Log_File: ax3-galera3-relay-bin.000002 Relay_Log_Pos: 810 Relay_Master_Log_File: mysql_bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1047 Last_Error: Node has dropped from cluster Skip_Counter: 0 Exec_Master_Log_Pos: 46918 Relay_Log_Space: 788265 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1047 Last_SQL_Error: Node has dropped from cluster Replicate_Ignore_Server_Ids: Master_Server_Id: 2 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 100-1-777,200-2-132 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: conservative SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave_DDL_Groups: 3 Slave_Non_Transactional_Groups: 0 Slave_Transactional_Groups: 568 1 row in set (0.000 sec)   node1:root@localhost> \q     node1:root@localhost> select * from sbtest1 where id=5009; + ------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | + ------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 5009 | 5011 | 21530626160-56347531229-46271551052-58215791514-32850707675-84078713984-55253540950-44570699456-37146602098-67863270568 | 11560770028-32012765973-27326258161-13215537521-02612531434 | + ------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.000 sec)   node1:root@localhost> select count (1) from sbtest1; + ----------+ | count (1) | + ----------+ | 126 | + ----------+ 1 row in set (0.000 sec)   node1:root@localhost> select @@version; + ---------------------+ | @@version | + ---------------------+ | 10.4.23-MariaDB-log | + ---------------------+ 1 row in set (0.000 sec)   node1:root@localhost>   Error Info   2022-04-08 11:42:13 16 [Note] 'CHANGE MASTER TO executed' . Previous state master_host= '172.20.2.34' , master_port= '22320' , master_log_file= '' , master_log_pos= '4' . New state master_host= '172.20.2.34' , master_port= '22320' , master_log_file= '' , master_log_pos= '4' . 2022-04-08 11:42:13 16 [Note] Previous Using_Gtid= No . New Using_Gtid=Slave_Pos 2022-04-08 11:42:17 23 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@172.20.2.34:22320' in log '' at position 4 2022-04-08 11:42:17 24 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './ax3-galera3-relay-bin.000001' position: 4; GTID position '100-1-270,200-2-132' 2022-04-08 11:42:17 23 [Note] Slave I/O thread: connected to master 'repl@172.20.2.34:22320' ,replication starts at GTID position '100-1-270,200-2-132' 2022-04-08 11:42:17 24 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.sbtest1; Duplicate entry '5009' for key 'PRIMARY' , Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event 's master log mysql_bin.000001, end_log_pos 47564, Internal MariaDB error code: 1062 2022-04-08 11:42:17 24 [ERROR] Slave SQL: Node has dropped from cluster, Internal MariaDB error code: 1047 2022-04-08 11:42:17 24 [Note] Slave SQL thread exiting, replication stopped in log ' mysql_bin.000001 ' at position 46918; GTID position ' 100-1-270,200-2-132' 2022-04-08 11:42:17 24 [Note] master was 172.20.2.34:22320       Node A   node1:root@localhost> select * from sbtest1 where id=5009; + ------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | + ------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 5009 | 5026 | 51321880912-92604745426-70132595668-50247508417-03306661890-51432289092-75619253222-29283573576-38537233912-64752942631 | 18679717763-26375392697-59963602089-17044746735-03418792121 | + ------+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.000 sec)   node1:root@localhost>

            ramesh Thanks, can you test with 10.5 also because it should have MDEV-20720 fixed.

            jplindst Jan Lindström (Inactive) added a comment - ramesh Thanks, can you test with 10.5 also because it should have MDEV-20720 fixed.

            jplindst Could not reproduce the issue on 10.5.16.

            ramesh Ramesh Sivaraman added a comment - jplindst Could not reproduce the issue on 10.5.16.

            pramod.mahto@mariadb.com Based on https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-galera-cluster/ Until MariaDB 10.5.1, there were known cases where GTIDs could become inconsistent across the cluster nodes. So if customer requires GTIDs to be consistent, use 10.5. To have consistent GTIDs you need MDEV-20720 to be fixed. Note that this is fixed on 10.4ES also.

            jplindst Jan Lindström (Inactive) added a comment - - edited pramod.mahto@mariadb.com Based on https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-galera-cluster/ Until MariaDB 10.5.1, there were known cases where GTIDs could become inconsistent across the cluster nodes. So if customer requires GTIDs to be consistent, use 10.5. To have consistent GTIDs you need MDEV-20720 to be fixed. Note that this is fixed on 10.4ES also.
            maxmether Max Mether added a comment -

            ccalender

            They way I read Jan's comment is that the ability to have consistent GTID in Galera is a feature that was not added until 10.5 (or 10.4 ES). Hence the fact that they are not consistent prior to 10.5 is not a bug but a feature (or lack of a feature). A bit like it would be to ask for Atomic DDL prior to 10.6, that feature just didn't exist until 10.6

            maxmether Max Mether added a comment - ccalender They way I read Jan's comment is that the ability to have consistent GTID in Galera is a feature that was not added until 10.5 (or 10.4 ES). Hence the fact that they are not consistent prior to 10.5 is not a bug but a feature (or lack of a feature). A bit like it would be to ask for Atomic DDL prior to 10.6, that feature just didn't exist until 10.6

            People

              pramod.mahto@mariadb.com Pramod Mahto
              pramod.mahto@mariadb.com Pramod Mahto
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.