[MDEV-15278] rpl.rpl_parallel_optimistic failed in buildbot, failed to sync with master Created: 2018-02-11  Updated: 2023-08-25  Resolved: 2023-08-15

Status: Closed
Project: MariaDB Server
Component/s: Replication, Tests
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Kristian Nielsen
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MDEV-31655 Parallel replication deadlock victim ... Closed
Relates
relates to MDEV-10550 Parallel replication can't sync with ... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/51/steps/mtr/logs/stdio

rpl.rpl_parallel_optimistic 'innodb,stmt' w2 [ fail ]
        Test ended at 2018-02-06 20:43:04
 
CURRENT_TEST: rpl.rpl_parallel_optimistic
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /usr/share/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test at line 306:
At line 48: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
7	5
8	7
9	8
10	8
SELECT * FROM t2 ORDER BY a;
a	b
1	0
2	0
4	4
5	5
6	5
7	7
8	7
9	8
10	10
include/save_master_gtid.inc
connection server_2;
include/start_slave.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-88', 120), current slave GTID position is: 0-1-87.



 Comments   
Comment by Andrei Elkin [ 2018-02-12 ]

The case looks as relating to MDEV-12746 rpl.rpl_parallel_optimistic_nobinlog fails committing out of order at retry. Could be as well be a duplicate.

The failure happens in a similar block of the supposed parent bug: 10 worker threads may not complete
execution of the INSERTs on ll:276..297 because of a glitch in temp failure retrying that
MDEV-12746 is fixing.

Comment by Alice Sherepa [ 2020-09-18 ]

the failure still happens -
on 10.5 http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/4776/steps/mtr/logs/stdio

rpl.rpl_parallel_optimistic 'innodb,mix' w3 [ fail ]
        Test ended at 2020-08-26 06:00:15
 
CURRENT_TEST: rpl.rpl_parallel_optimistic
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /usr/share/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test at line 315:
At line 48: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
7	5
8	7
9	8
10	8
SELECT * FROM t2 ORDER BY a;
a	b
1	0
2	0
4	4
5	5
6	5
7	7
8	7
9	8
10	10
include/save_master_gtid.inc
connection server_2;
include/start_slave.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-88', 120), current slave GTID position is: 0-1-77.
 
More results from queries before failure can be found in /dev/shm/var/3/log/rpl_parallel_optimistic.log

Comment by Alice Sherepa [ 2020-12-22 ]

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/5480/steps/mtr/logs/stdio

Comment by Alice Sherepa [ 2021-07-01 ]

up http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/6719/steps/mtr/logs/stdio

Comment by Kristian Nielsen [ 2023-08-09 ]

This is almost certainly a duplicate of MDEV-28776 / MDEV-31655.
The test is a plain parallel replication of a batch of conflicting INSERT ... SELECT. And the server error log shows that these fail with deadlock error and more than 10 retries needed.

2023-07-14 21:23:16 86 [ERROR] Slave worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
2023-07-14 21:23:16 86 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
2023-07-14 21:23:16 85 [Warning] Slave: Connection was killed Error_code: 1927

Comment by Kristian Nielsen [ 2023-08-15 ]

Fixed with the push of MDEV-31655 to 10.4

Generated at Thu Feb 08 08:20:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.