[MDEV-31326] insert statement executed multiple times on slave Created: 2023-05-22  Updated: 2023-06-20

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.6.13
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Patrick Moiroux Assignee: Angelique Sklavounos (Inactive)
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Centos7
slave_parallel_threads = 128
slave_parallel_mode = conservative


Issue Links:
Relates
relates to MDEV-31120 Duplicate entry allowed into a UNIQUE... Closed

 Description   

After upgrade to version 10.6.13, replication stops with Error 'Duplicate entry'

show slave status \G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: ********
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000545
           Read_Master_Log_Pos: 402372410
                Relay_Log_File: slave-relay.001565
                 Relay_Log_Pos: 316419462
         Relay_Master_Log_File: mysql-bin.000511
              Slave_IO_Running: Yes
             Slave_SQL_Running: No
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 1062
                    Last_Error: Error 'Duplicate entry '88473796' for key 'PRIMARY'' on query.
                 Skip_Counter: 0
           Exec_Master_Log_Pos: 316419163
 
 
#Skip:
 
MariaDB []> stop slave;
Query OK, 0 rows affected (0.077 sec)
 
MariaDB [> SET GLOBAL sql_slave_skip_counter = 1;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB []> start slave;
Query OK, 0 rows affected (0.019 sec)
 
MariaDB []> show slave status \G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: ********
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000545
           Read_Master_Log_Pos: 405814390
                Relay_Log_File: slave-relay.001565
                 Relay_Log_Pos: 316423101
         Relay_Master_Log_File: mysql-bin.000511
              Slave_IO_Running: Yes
             Slave_SQL_Running: No
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 1062
                    Last_Error: Error 'Duplicate entry '88473796' for key 'PRIMARY''
Skip_Counter: 0
           Exec_Master_Log_Pos: 316422802

We can see the Exec_Master_Log_Pos update from 316419163 to 316422802 , but it fails on the same insert.

In binary logs the insert is done only once, at position 316425036

Here is what we have in the binary logs:

#230520 11:38:28 server id 1  end_log_pos 316419163 CRC32 0xeae636ef    Xid = 990882515
COMMIT/*!*/;
# at 316419163
 
#230520 11:38:28 server id 1  end_log_pos 316422802 CRC32 0x099dbd6a    Xid = 990882526
COMMIT/*!*/;
# at 316422802
 
# at 316425036
#230520 11:38:28 server id 1  end_log_pos 316426018 CRC32 0xa4bad5cc    Query   thread_id=2393709       exec_time=0     error_code=0    xid=0
SET TIMESTAMP=1684575508/*!*/;
insert into ... values ... 88473796

I will put the binary logs on the FTP server for confidentiality reasons.

Thanks



 Comments   
Comment by Patrick Moiroux [ 2023-05-22 ]

MDEV-31326.tgz uploaded to ftp.mariadb.org/private/

Thanks

Comment by Patrick Moiroux [ 2023-05-23 ]

Hello,

I don't think it's related to MDEV-31120 ..

I checked on the master and the row is unique..

Each time I was skipping the insert on the slave, it was going further and having the same issue on another insert on that table ( 14028 inserts on that table in the file mysql-bin.000511)

I skipped about 20 inserts, and then I updated the value of slave_parallel_threads from 128 to 1, and suddenly it worked without issue...

So to summarize:

With slave_parallel_threads =128 , it seems like multiple threads were trying to commit the same inserts.

With slave_parallel_threads =1, slave was able to process the entire binary logs and to catch up the master...

Comment by Angelique Sklavounos (Inactive) [ 2023-05-24 ]

Hi pmoiroux,

Thanks for the report. I have some questions:

1) What was the process for the upgrade?
2) Were both master and slave upgraded?
3) Would it be possible to get the schema? For instance, with mariadb-dump --all-databases --no-data? This can be uploaded to the same ftp server.

Thank you.

Comment by Patrick Moiroux [ 2023-05-25 ]

Hi Angelique,

Yes both slave and master were upgraded (Slave first) and upgrade was standard:

Stop the slave
yum update MariaDB*
Start the slave
Do the same on the Master

I uploaded the structure on the private folder: MDEV-31326.dump.tgz

Thanks

Comment by Angelique Sklavounos (Inactive) [ 2023-05-26 ]

Hi pmoiroux,

Thanks for that. A few more questions/requests...

1) What was the old version of MariaDB before the upgrade?
2) Output of show global variables
3) Do you happen to have the full output from SHOW SLAVE STATUS that was shown in the Description?
4) What was the CHANGE MASTER statement used on the slave?

Thank you.

Comment by Patrick Moiroux [ 2023-05-26 ]

Hello Angelique,

1) I was running 10.6.12 but was impacted by “InnoDB hang on B-tree split or merge (MDEV-29835)” , so I ran for a while with 10.6.13 from https://ci.mariadb.org/33837/amd64-centos-7-rpm-autobake/rpms. Then I installed the official 10.6.13 release on 2023-05-11. The replication was broken on 2023-05-20 and after multiple attempts to skip duplicate with "SET GLOBAL sql_slave_skip_counter = 1;" , I decided to update the value of slave_parallel_threads from 128 to 1 and it allowed to continue without errors and to catch the master. After that I switched from "Parallel_Mode: conservative" to "Parallel_Mode: optimistic" and back to "slave_parallel_threads = 128". Issue did not occur since then

2) Output uploaded to MDEV-31326.output.tgz

3) Yes I still have the output of "SHOW SLAVE STATUS". uploaded as well

4) I did not execute CHANGE MASTER since I did the initial setup but the command I used was "CHANGE MASTER TO MASTER_HOST='XXXXX',MASTER_USER='repl',MASTER_PASSWORD='XXXXXX',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=608;"

Thanks,
Patrick

Generated at Thu Feb 08 10:23:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.