[MDEV-23021] rpl.rpl_parallel_optimistic_until fails on BB with various pattern Created: 2020-06-26  Updated: 2023-10-05  Resolved: 2023-08-10

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4

Type: Bug Priority: Critical
Reporter: Andrei Elkin Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: affects-tests, sporadic

Issue Links:
Relates
relates to MDEV-30619 Parallel Slave SQL Thread Can Update ... Closed

 Description   

Most probably this is this relatively new test, introduced by dbe447a7890 internal issue.
To be fixed soon.



 Comments   
Comment by Elena Stepanova [ 2020-08-02 ]

Fails on 10.2 too
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/4589

10.2 91ebf1844f4

rpl.rpl_parallel_optimistic_until 'innodb,row' w3 [ fail ]
        Test ended at 2020-07-31 12:16:02
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_until
mysqltest: At line 274: Query 'SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000003' FROM 508 LIMIT 1' didn't return a result set
 
The result from queries just before the failure was:
< snip >
1
SELECT count(*) = 0 as 'trx2 is not committed' FROM t1 WHERE b = 'trx2';
trx2 is not committed
1
Proof 2: Resume works out
include/start_slave.inc
connection master;
connection slave;
*** case 4 Relay-log UNTIL inside trx1
connection slave;
DELETE FROM t1 WHERE a <> 25;
UPDATE t1 SET b='trx0' WHERE a = 25;
connection slave1;
BEGIN;
INSERT INTO t1 SET a= 1; # block trx1;
connection slave;
include/stop_slave.inc
CHANGE MASTER TO MASTER_LOG_FILE = 'file_1', MASTER_LOG_POS = <pos_trx0>, MASTER_USE_GTID=no;
START SLAVE IO_THREAD;
include/wait_for_slave_io_to_start.inc
 
More results from queries before failure can be found in /dev/shm/var/3/log/rpl_parallel_optimistic_until.log

Comment by Marko Mäkelä [ 2020-09-22 ]

Recent failure on 10.5: http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/4885/steps/mtr/logs/stdio

10.5 6ab6b1510ed0851bda335a8919047c02860fe2e8

rpl.rpl_parallel_optimistic_until 'innodb,row' w2 [ fail ]
        Test ended at 2020-09-20 17:13:55
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_until
mysqltest: At line 274: Query 'SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000003' FROM 508 LIMIT 1' didn't return a result set
 
The result from queries just before the failure was:
< snip >
1
SELECT count(*) = 0 as 'trx2 is not committed' FROM t1 WHERE b = 'trx2';
trx2 is not committed
1
Proof 2: Resume works out
include/start_slave.inc
connection master;
connection slave;
*** case 4 Relay-log UNTIL inside trx1
connection slave;
DELETE FROM t1 WHERE a <> 25;
UPDATE t1 SET b='trx0' WHERE a = 25;
connection slave1;
BEGIN;
INSERT INTO t1 SET a= 1; # block trx1;
connection slave;
include/stop_slave.inc
CHANGE MASTER TO MASTER_LOG_FILE = 'file_1', MASTER_LOG_POS = <pos_trx0>, MASTER_USE_GTID=no;
START SLAVE IO_THREAD;
include/wait_for_slave_io_to_start.inc
 
More results from queries before failure can be found in /dev/shm/var/2/log/rpl_parallel_optimistic_until.log
 
 - saving '/dev/shm/var/2/log/rpl.rpl_parallel_optimistic_until-innodb,row/' to '/dev/shm/var/log/rpl.rpl_parallel_optimistic_until-innodb,row/'
 
Retrying test rpl.rpl_parallel_optimistic_until, attempt(2/3)...
 
worker[2] > Restart  - not started
worker[2] > Restart  - not started
rpl.rpl_parallel_optimistic_until 'innodb,row' w2 [ retry-pass ]   3562

Comment by Alice Sherepa [ 2020-10-29 ]

on 10.1 test fails with result mismatch: http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-centos6-amd64-debug/builds/3209/steps/mtr/logs/stdio

 
rpl.rpl_parallel_optimistic_until 'innodb_plugin,row' w4 [ fail ]
        Test ended at 2020-10-29 11:03:01
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_until
--- /usr/local/mariadb-10.1.48-linux-x86_64/mysql-test/suite/rpl/r/rpl_parallel_optimistic_until.result	2020-10-29 09:33:01.000000000 +0200
+++ /usr/local/mariadb-10.1.48-linux-x86_64/mysql-test/suite/rpl/r/rpl_parallel_optimistic_until.reject	2020-10-29 11:03:01.628350106 +0200
@@ -52,7 +52,7 @@
 INSERT INTO t1 SET a= 1;
 SELECT <pos_0> <= <pos_until> AND <pos_until> < <pos_trx2> as "pos_until < trx0 and is within trx2";
 pos_until < trx0 and is within trx2
-1
+0
 CHANGE MASTER TO MASTER_USE_GTID=no;
 START SLAVE UNTIL MASTER_LOG_FILE = 'file_2', MASTER_LOG_POS = <pos_until>;
 ROLLBACK;
@@ -63,7 +63,7 @@
 1
 SELECT count(*) = 0 as 'trx3 is not committed' FROM t1 WHERE b = 'trx3';
 trx3 is not committed
-1
+0
 Proof 2: Resume works out
 include/start_slave.inc
 *** case 2 UNTIL inside trx2
@@ -74,7 +74,7 @@
 include/stop_slave.inc
 SELECT <pos_0> <= <pos_until> AND <pos_until> < <pos_trx2> as "pos_until >= trx0 and is within trx2";
 pos_until >= trx0 and is within trx2
-1
+0
 CHANGE MASTER TO MASTER_LOG_FILE = 'file_1', MASTER_LOG_POS = <pos_trx0>, MASTER_USE_GTID=no;
 START SLAVE UNTIL MASTER_LOG_FILE = 'file_2', MASTER_LOG_POS = <pos_until>;
 ROLLBACK;
 
mysqltest: Result content mismatch

latest fail

Comment by Alice Sherepa [ 2020-12-04 ]

10.5 http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/25940/steps/mtr_nm/logs/stdio

Comment by Alice Sherepa [ 2021-03-11 ]

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/27402/steps/mtr_nm/logs/stdio

10.2 374ec82f0828933a511f

rpl.rpl_parallel_optimistic_until 'innodb,row' w2 [ fail ]
        Test ended at 2021-03-11 02:53:01
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_until
mysqltest: At line 279: 
 
The result from queries just before the failure was:
< snip >
SELECT count(*) = 0 as 'trx2 is not committed' FROM t1 WHERE b = 'trx2';
trx2 is not committed
1
Proof 2: Resume works out
include/start_slave.inc
connection master;
connection slave;
*** case 4 Relay-log UNTIL inside trx1
connection slave;
DELETE FROM t1 WHERE a <> 25;
UPDATE t1 SET b='trx0' WHERE a = 25;
connection slave1;
BEGIN;
INSERT INTO t1 SET a= 1; # block trx1;
connection slave;
include/stop_slave.inc
CHANGE MASTER TO MASTER_LOG_FILE = 'file_1', MASTER_LOG_POS = <pos_trx0>, MASTER_USE_GTID=no;
START SLAVE IO_THREAD;
include/wait_for_slave_io_to_start.inc
*** Unexpected offset. Refine it to point to the correct XID event!

Comment by Alice Sherepa [ 2021-04-30 ]

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/6302/steps/mtr_centos_mix/logs/stdio
10.4 0cc811c633d1fe5290b10f

Comment by Marko Mäkelä [ 2021-09-22 ]

10.3 was not mentioned yet:

10.3 b46cf33ab8ce869af0f51c35026965d237d722c7

rpl.rpl_parallel_optimistic_until 'innodb,row' w21 [ fail ]
        Test ended at 2021-09-22 18:05:05
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_until
mysqltest: At line 274: Query 'SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000003' FROM 508 LIMIT 1' didn't return a result set
 
The result from queries just before the failure was:
< snip >
1
SELECT count(*) = 0 as 'trx2 is not committed' FROM t1 WHERE b = 'trx2';
trx2 is not committed
1
Proof 2: Resume works out
include/start_slave.inc
connection master;
connection slave;
*** case 4 Relay-log UNTIL inside trx1
connection slave;
DELETE FROM t1 WHERE a <> 25;
UPDATE t1 SET b='trx0' WHERE a = 25;
connection slave1;
BEGIN;
INSERT INTO t1 SET a= 1; # block trx1;
connection slave;
include/stop_slave.inc
CHANGE MASTER TO MASTER_LOG_FILE = 'file_1', MASTER_LOG_POS = <pos_trx0>, MASTER_USE_GTID=no;
START SLAVE IO_THREAD;
include/wait_for_slave_io_to_start.inc

Comment by Marko Mäkelä [ 2021-10-28 ]

10.7 d7af7bfc2b6ab89801059406e1bd74846a2ca8e8

rpl.rpl_parallel_optimistic_until 'innodb,row' w8 [ fail ]
        Test ended at 2021-10-28 09:27:10
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_until
mysqltest: At line 274: Query 'SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000003' FROM 508 LIMIT 1' didn't return a result set

Comment by Angelique Sklavounos (Inactive) [ 2022-02-18 ]

https://buildbot.mariadb.org/#/builders/192/builds/7772/steps/6/logs/stdio
10.8 1c5b099a9619c953e7510bbafca89353ad0a020c

https://buildbot.mariadb.org/#/builders/192/builds/7783/steps/6/logs/stdio
10.9 bbe99cd4e2d7c83a06dd93ea88af97f2d5796810

Comment by Roel Van de Paar [ 2023-07-23 ]

I ran into this also. Sporadic, but common, failure. Issue occurrence may be more pronounced after bb-10.4-MDEV-30619.

bb-10.4-MDEV-30619 10.4.31 006aca3111e0cbd5f172060774c74e328bea4b0a (Debug)

...
rpl.rpl_parallel_optimistic_until 'innodb,row' w21 [ 157 fail ]
        Test ended at 2023-07-22 17:25:26
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_until
mysqltest: At line 274: Query 'SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000003' FROM 508 LIMIT 1' didn't return a result set
 
The result from queries just before the failure was:
< snip >
1
SELECT count(*) = 0 as 'trx2 is not committed' FROM t1 WHERE b = 'trx2';
trx2 is not committed
1
Proof 2: Resume works out
include/start_slave.inc
connection master;
connection slave;
*** case 4 Relay-log UNTIL inside trx1
connection slave;
DELETE FROM t1 WHERE a <> 25;
UPDATE t1 SET b='trx0' WHERE a = 25;
connection slave1;
BEGIN;
INSERT INTO t1 SET a= 1; # block trx1;
connection slave;
include/stop_slave.inc
CHANGE MASTER TO MASTER_LOG_FILE = 'file_1', MASTER_LOG_POS = <pos_trx0>, MASTER_USE_GTID=no;
START SLAVE IO_THREAD;
include/wait_for_slave_io_to_start.inc

Comment by Roel Van de Paar [ 2023-07-23 ]

Also observed on same branch, also sporadic, though (much) less frequent:

10.4.31 006aca3111e0cbd5f172060774c74e328bea4b0a (Debug)

rpl.rpl_parallel_optimistic_until 'innodb,row' w43 [ 207 fail ]
        Test ended at 2023-07-23 16:56:06
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_until
mysqltest: At line 279:
 
The result from queries just before the failure was:
< snip >
SELECT count(*) = 0 as 'trx2 is not committed' FROM t1 WHERE b = 'trx2';
trx2 is not committed
1
Proof 2: Resume works out
include/start_slave.inc
connection master;
connection slave;
*** case 4 Relay-log UNTIL inside trx1
connection slave;
DELETE FROM t1 WHERE a <> 25;
UPDATE t1 SET b='trx0' WHERE a = 25;
connection slave1;
BEGIN;
INSERT INTO t1 SET a= 1; # block trx1;
connection slave;
include/stop_slave.inc
CHANGE MASTER TO MASTER_LOG_FILE = 'file_1', MASTER_LOG_POS = <pos_trx0>, MASTER_USE_GTID=no;
START SLAVE IO_THREAD;
include/wait_for_slave_io_to_start.inc
*** Unexpected offset. Refine it to point to the correct XID event!

Comment by Kristian Nielsen [ 2023-08-10 ]

Fix pushed to 10.4

Generated at Thu Feb 08 09:19:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.