Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39059

DELETE and INSERT of same row in concurrent transactions breaks row based replication with duplicate row error

    XMLWordPrintable

Details

    • Bug
    • Status: Needs Feedback (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.4.7, 11.8.5
    • None
    • Replication
    • None
    • x86 Debian 12, running inside docker, 64GB RAM

    Description

      Hi,

      I have two servers in a primary/replica setup using row based replication. The replica is read only and no traffic is routed to it (used as a hot spare). Note, that this is *not* using parallel replication.

      I am observing that replication breaks when the primary concurrently executes a DELETE following INSERT of the same row:

      Given the following table on both master and replica:

      CREATE TABLE `team_matchtogroup` (
        `matchid` int(11) NOT NULL DEFAULT 0,
        `groupid` int(11) NOT NULL DEFAULT 0,
        `listid` int(11) DEFAULT NULL,
        UNIQUE KEY `listid_uniq` (`listid`),
        KEY `matchid` (`matchid`),
        KEY `groupid` (`groupid`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
      

      and the dataset in the zip is loaded into both servers, when the following statements are run concurrently on the primary:

      Transaction A:

      START TRANSACTION;
      DELETE FROM team_matchtogroup WHERE listid = 2391011;
      SELECT SLEEP(10);
      COMMIT;
      

      Transaction B:

      START TRANSACTION;
      SELECT SLEEP(1);
      INSERT INTO team_matchtogroup (groupid, listid, matchid) VALUES (5573, 2391011, 0);
      COMMIT;
      

      then both transactions commits successfully on primary, but replication breaks with the following error on the replica:

      Last_SQL_Error: Could not execute Write_rows_v1 event on table sutdb.team_matchtogroup; Duplicate entry '2391011' for key 'listid_uniq', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log master-bin.000002, end_log_pos 0

      I am submitting this as a bug, because I believe that if the transactions succeed on the primary, then they should also succeed on the replica.

      I have attached a zip archive with a docker compose file, config,dataset as well as some helper scripts that reproduce the issue.

      Steps to reproduce:
      These steps can fairly consistently reproduce the issue:

      1. Have docker and docker compose installed.
      2. run 'docker compose up' . This will setup the servers with the config, and load the table with the dataset (around 60k rows).
      3. run ./setup-replication.sh to setup replication.
      4. run ./test.sh to reproduce the problem.

      Notes

      • Originally, I observed the problem for a setup with mariadb server version 11.8.5 replicating mariadb server with version 10.4.7, but the problem occurs also when replicating 11.8.5 -> 11.8.5.
      • If I add a primary key to the table, I can no longer reproduce the issue.
      • I was not able to reproduce the problem with a small dataset. I have attached the dataset that can reliably reproduce the issue.
      • The sleeps i added to the transactions are there to more reliably reproduce the problem.
      • mariadb data is stored in volumes in docker. Use ./down.sh to remove them if you need to start over.

      Please let me know if you need more information. Or, if this is expected behavior, what I can do to my config to avoid this issue.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mkote Morten Terndrup
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.