Details
-
Bug
-
Status: Needs Feedback (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.4.7, 11.8.5
-
None
-
None
-
x86 Debian 12, running inside docker, 64GB RAM
Description
Hi,
I have two servers in a primary/replica setup using row based replication. The replica is read only and no traffic is routed to it (used as a hot spare). Note, that this is *not* using parallel replication.
I am observing that replication breaks when the primary concurrently executes a DELETE following INSERT of the same row:
Given the following table on both master and replica:
CREATE TABLE `team_matchtogroup` ( |
`matchid` int(11) NOT NULL DEFAULT 0, |
`groupid` int(11) NOT NULL DEFAULT 0, |
`listid` int(11) DEFAULT NULL, |
UNIQUE KEY `listid_uniq` (`listid`), |
KEY `matchid` (`matchid`), |
KEY `groupid` (`groupid`) |
) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
and the dataset in the zip is loaded into both servers, when the following statements are run concurrently on the primary:
Transaction A:
START TRANSACTION; |
DELETE FROM team_matchtogroup WHERE listid = 2391011; |
SELECT SLEEP(10); |
COMMIT; |
Transaction B:
START TRANSACTION; |
SELECT SLEEP(1); |
INSERT INTO team_matchtogroup (groupid, listid, matchid) VALUES (5573, 2391011, 0); |
COMMIT; |
then both transactions commits successfully on primary, but replication breaks with the following error on the replica:
Last_SQL_Error: Could not execute Write_rows_v1 event on table sutdb.team_matchtogroup; Duplicate entry '2391011' for key 'listid_uniq', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log master-bin.000002, end_log_pos 0
I am submitting this as a bug, because I believe that if the transactions succeed on the primary, then they should also succeed on the replica.
I have attached a zip archive with a docker compose file, config,dataset as well as some helper scripts that reproduce the issue.
Steps to reproduce:
These steps can fairly consistently reproduce the issue:
1. Have docker and docker compose installed.
2. run 'docker compose up' . This will setup the servers with the config, and load the table with the dataset (around 60k rows).
3. run ./setup-replication.sh to setup replication.
4. run ./test.sh to reproduce the problem.
Notes
- Originally, I observed the problem for a setup with mariadb server version 11.8.5 replicating mariadb server with version 10.4.7, but the problem occurs also when replicating 11.8.5 -> 11.8.5.
- If I add a primary key to the table, I can no longer reproduce the issue.
- I was not able to reproduce the problem with a small dataset. I have attached the dataset that can reliably reproduce the issue.
- The sleeps i added to the transactions are there to more reliably reproduce the problem.
- mariadb data is stored in volumes in docker. Use ./down.sh to remove them if you need to start over.
Please let me know if you need more information. Or, if this is expected behavior, what I can do to my config to avoid this issue.