Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20605

Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration

    XMLWordPrintable

Details

    Description

      Some users are seeing duplicate key errors like the following when using the optimistic and aggressive modes of parallel replication:

      MariaDB [(none)]> SHOW SLAVE STATUS\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: 172.30.0.154
                        Master_User: repl
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000055
                Read_Master_Log_Pos: 593262418
                     Relay_Log_File: ip-172-30-0-228-relay-bin.000002
                      Relay_Log_Pos: 33725828
              Relay_Master_Log_File: mariadb-bin.000039
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB:
                Replicate_Ignore_DB:
                 Replicate_Do_Table:
             Replicate_Ignore_Table:
            Replicate_Wild_Do_Table:
        Replicate_Wild_Ignore_Table:
                         Last_Errno: 1062
                         Last_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 705697132
                    Relay_Log_Space: 17101184558
                    Until_Condition: None
                     Until_Log_File:
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File:
                 Master_SSL_CA_Path:
                    Master_SSL_Cert:
                  Master_SSL_Cipher:
                     Master_SSL_Key:
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error:
                     Last_SQL_Errno: 1062
                     Last_SQL_Error: Error 'Duplicate entry '...' for key 'name'' on query. Default database: 'sample3'. Query: 'INSERT INTO ...'
        Replicate_Ignore_Server_Ids:
                   Master_Server_Id: 1
                     Master_SSL_Crl:
                 Master_SSL_Crlpath:
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 0-1-42163518
            Replicate_Do_Domain_Ids:
        Replicate_Ignore_Domain_Ids:
                      Parallel_Mode: aggressive
                          SQL_Delay: 0
                SQL_Remaining_Delay: NULL
            Slave_SQL_Running_State:
      1 row in set (0.00 sec)
      

      However, when the problem is analyzed, it is clear that the duplicate key should not occur.

      The configuration:

      [mariadb]
      server_id=2
      log_bin=mariadb-bin
      log_slave_updates
      binlog_format=MIXED
      slave_parallel_threads=16
      slave_parallel_max_queued=1048576
      slave_parallel_mode=aggressive
      

      The configuration with slave_parallel_mode=optimistic was identical.

      Some observations:

      • When the table is queried on the slave, the row does indeed exists, so it is clear that the duplicate key error is real.
      • The SQL script contains an UPDATE and a DELETE before the INSERT, so the row should have been deleted, which should have prevented the duplicate key error.
      • The row still exists, so it seems that the DELETE statement had no effect.
      • The values in the existing row in the table are the old non-updated values, so it seems that the UPDATE statement also had no effect.
      • The slave's binary log contains both the UPDATE and DELETE statements, which should indicate that the statements were successfully executed on the slave.
      • I haven't yet been able to reproduce the problem when the master has binlog_format=ROW set, so row-based replication may be immune to the problem.

      Attachments

        1. MDEV-20605.patch
          8 kB
        2. MDEV-20605-deadlock.test
          2 kB
        3. MDEV-20605-locking.test
          3 kB
        4. MDEV-20605-restore-cursor.test
          3 kB
        5. MDEV-20605-restore-cursor-v2.test
          3 kB
        6. thiru_20605.yy
          0.9 kB

        Issue Links

          Activity

            People

              vlad.lesin Vladislav Lesin
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              5 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.