Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.4(EOL)
Description
mysql-test$ ./mtr --repeat 1000 rpl.rpl_parallel_sbm
|
Leads to:
bb-10.4-MDEV-30619 10.4.31 07745e9e6a00ff6089c68f48d453dfa368f064f4 (Debug) |
...
|
rpl.rpl_parallel_sbm 'mix' [ 18 pass ] 18674
|
rpl.rpl_parallel_sbm 'mix' [ 19 pass ] 18698
|
rpl.rpl_parallel_sbm 'mix' [ 20 fail ]
|
Test ended at 2023-07-20 10:13:11
|
|
CURRENT_TEST: rpl.rpl_parallel_sbm
|
mysqltest: At line 100: Seconds_Behind_Master should not have used second transaction timestamp
|
|
The result from queries just before the failure was:
|
< snip >
|
# Waiting for transaction to arrive on slave and begin SQL Delay..
|
# Validating SBM is updated on event arrival..
|
# ..done
|
connection slave;
|
UNLOCK TABLES;
|
include/sync_with_master_gtid.inc
|
#
|
# Pt 2) If the SQL thread has not entered an idle state, ensure
|
# following events do not update SBM
|
connection master;
|
# Sleep 2 to allow a buffer between events for SBM check
|
insert into t1 values (1);
|
# Sleep 3 to create gap between events
|
insert into t1 values (2);
|
include/save_master_pos.inc
|
connection slave;
|
LOCK TABLES t1 WRITE;
|
# Wait for first transaction to complete SQL delay and begin execution..
|
# Validate SBM calculation doesn't use the second transaction because SQL thread shouldn't have gone idle..
|
# SBM 3 was more recent than time since last transaction (4 seconds)
|
Or, in another instance:
bb-10.4-MDEV-30619 10.4.31 07745e9e6a00ff6089c68f48d453dfa368f064f4 (Optimized) |
...
|
rpl.rpl_parallel_sbm 'mix' [ 19 pass ] 18622
|
rpl.rpl_parallel_sbm 'mix' [ 20 pass ] 19592
|
rpl.rpl_parallel_sbm 'mix' [ 21 fail ]
|
Test ended at 2023-07-20 12:47:36
|
|
CURRENT_TEST: rpl.rpl_parallel_sbm
|
mysqltest: At line 177: Seconds_Behind_Master is too recent
|
|
The result from queries just before the failure was:
|
< snip >
|
connection master;
|
include/sync_slave_sql_with_master.inc
|
# Lock t1 on slave so the first received transaction does not complete/commit
|
connection slave;
|
LOCK TABLES t1 WRITE;
|
connection master;
|
INSERT INTO t1 VALUES (1);
|
include/save_master_gtid.inc
|
connection slave;
|
# Waiting for first transaction to begin..
|
# Wait for SQL thread to catch up and idle
|
# Sleep 2s to create a gap between events
|
LOCK TABLES mysql.gtid_slave_pos WRITE;
|
connection master;
|
INSERT INTO t2 VALUES (1);
|
include/save_master_gtid.inc
|
connection slave;
|
# Waiting for second transaction to begin..
|
# Validate that the second transaction doesn't update Seconds_Behind_Master on reception after the SQL thread has gone idle..
|
# Seconds_Behind_Master was 2 yet should not have been less than 3 (1689821254 - 1689821251)
|
And:
bb-10.4-MDEV-30619 10.4.31 07745e9e6a00ff6089c68f48d453dfa368f064f4 (Optimized) |
...
|
rpl.rpl_parallel_sbm 'mix' [ 85 pass ] 18592
|
rpl.rpl_parallel_sbm 'mix' [ 86 pass ] 19673
|
rpl.rpl_parallel_sbm 'mix' [ 87 fail ]
|
Test ended at 2023-07-20 13:33:27
|
|
CURRENT_TEST: rpl.rpl_parallel_sbm
|
mysqltest: At line 184: Seconds_Behind_Master is too old
|
|
The result from queries just before the failure was:
|
< snip >
|
include/sync_slave_sql_with_master.inc
|
# Lock t1 on slave so the first received transaction does not complete/commit
|
connection slave;
|
LOCK TABLES t1 WRITE;
|
connection master;
|
INSERT INTO t1 VALUES (1);
|
include/save_master_gtid.inc
|
connection slave;
|
# Waiting for first transaction to begin..
|
# Wait for SQL thread to catch up and idle
|
# Sleep 2s to create a gap between events
|
LOCK TABLES mysql.gtid_slave_pos WRITE;
|
connection master;
|
INSERT INTO t2 VALUES (1);
|
include/save_master_gtid.inc
|
connection slave;
|
# Waiting for second transaction to begin..
|
# Validate that the second transaction doesn't update Seconds_Behind_Master on reception after the SQL thread has gone idle..
|
# ..and that its value correctly corresponds to the first transaction's timestamp on the primary
|
# Seconds_Behind_Master was 3 yet should not have been more than 2 (1689824005 - 1689824003)
|
It usually takes between 20 to 50 attempts to reproduce, though it can take 100+. System load may affect reproducibility, more load is presumably better.
Observed on both debug and optimized builds as shown above.
This particular test is not available prior to the MDEV-30619 fixes, and the previous testcase (executed against a pre-MDEV-30619 debug (this test requires debug) build of 10.4.31 at ea4b8d4ce99d0f427dd7ed40b5eeb15e41eec01e) :
./mtr --repeat 100 rpl.rpl_delayed_parallel_slave_sbm |
Does not produce any issue (400+ attempts).
May affect later MariaDB versions also.
Attachments
Issue Links
- is caused by
-
MDEV-30619 Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers
- Closed
- is part of
-
MDEV-30619 Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers
- Closed