[MDEV-27950] rpl.rpl_delayed_slave failed for test assertion 'Status should be 'Waiting until MASTER_DELAY...'' Created: 2022-02-25  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Replication, Tests
Affects Version/s: 10.6.7, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Angelique Sklavounos (Inactive) Assignee: Angelique Sklavounos (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-26633 rpl.rpl_delayed_slave failed in bb, T... Open

 Description   

https://buildbot.mariadb.org/#/builders/208/builds/4366/steps/6/logs/stdio
10.7 01bb003a3
amd64-ubuntu-1804

[Revision hash: 01bb003a3a0864c3e7f140816cc681890d0fc988]
 
rpl.rpl_delayed_slave 'nonparallel,stmt' w4 [ fail ]
        Test ended at 2022-02-23 19:53:59
 
CURRENT_TEST: rpl.rpl_delayed_slave
mysqltest: In included file "./include/rpl_assert.inc": 
included from /buildbot/amd64-ubuntu-1804-bigtest/build/mysql-test/suite/rpl/include/delayed_slave_wait_on_query.inc at line 39:
included from /buildbot/amd64-ubuntu-1804-bigtest/build/mysql-test/suite/rpl/t/rpl_delayed_slave.test at line 262:
At line 109: Test assertion failed in rpl_assertion.inc
 
The result from queries just before the failure was:
< snip >
master-bin.000001	2242	Query	1	2315	COMMIT
master-bin.000001	2315	Gtid	1	2357	BEGIN GTID 0-1-9
master-bin.000001	2357	Query	1	2466	use `test`; INSERT INTO t1 VALUES ('Syncing slave', 5)
master-bin.000001	2466	Query	1	2539	COMMIT
master-bin.000001	2539	Gtid	1	2581	BEGIN GTID 0-1-10
master-bin.000001	2581	Query	1	2692	use `test`; INSERT INTO t1 VALUES (delay_on_slave(1), 6)
master-bin.000001	2692	Query	1	2765	COMMIT
master-bin.000001	2765	Gtid	1	2807	BEGIN GTID 0-1-11
master-bin.000001	2807	Query	1	2934	use `test`; INSERT INTO t1 VALUES ('stop slave and start slave: DML', 7)
master-bin.000001	2934	Query	1	3007	COMMIT
 
**** SHOW RELAYLOG EVENTS on server_1 ****
relaylog_name = 'No such row'
SHOW RELAYLOG EVENTS IN 'No such row';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
connection slave;
Assertion text: 'Status should be 'Waiting until MASTER_DELAY...''
Assertion condition: '"[SHOW SLAVE STATUS, Slave_SQL_Running_State, 1]" LIKE "Waiting until MASTER_DELAY%"'
Assertion condition, interpolated: '"Slave has read all relay log; waiting for more updates" LIKE "Waiting until MASTER_DELAY%"'
Assertion result: '0'

The fail appears to usually end up as retry-pass, and is intermittent.



 Comments   
Comment by Angelique Sklavounos (Inactive) [ 2022-02-25 ]

With the same traceback, sometimes the interpolated text is different.

10.7 507084517 https://buildbot.mariadb.org/#/builders/208/builds/4356

rpl.rpl_delayed_slave 'nonparallel,stmt' w2 [ fail ]
        Test ended at 2022-02-22 19:16:44
CURRENT_TEST: rpl.rpl_delayed_slave
mysqltest: In included file "./include/rpl_assert.inc": 
included from /buildbot/amd64-ubuntu-1804-bigtest/build/mysql-test/suite/rpl/include/delayed_slave_wait_on_query.inc at line 39:
included from /buildbot/amd64-ubuntu-1804-bigtest/build/mysql-test/suite/rpl/t/rpl_delayed_slave.test at line 262:
At line 109: Test assertion failed in rpl_assertion.inc
Assertion text: 'Status should be 'Waiting until MASTER_DELAY...''
Assertion condition: '"[SHOW SLAVE STATUS, Slave_SQL_Running_State, 1]" LIKE "Waiting until MASTER_DELAY%"'
Assertion condition, interpolated: '"checking permissions" LIKE "Waiting until MASTER_DELAY%"'
Assertion result: '0'

10.2 647e95231 https://buildbot.mariadb.org/#/builders/208/builds/4350

rpl.rpl_delayed_slave 'nonparallel,stmt' w2 [ fail ]
        Test ended at 2022-02-22 11:59:06
CURRENT_TEST: rpl.rpl_delayed_slave
mysqltest: In included file "./include/rpl_assert.inc": 
included from ./extra/rpl_tests/delayed_slave_wait_on_query.inc at line 39:
included from /buildbot/amd64-ubuntu-1804-bigtest/build/mysql-test/suite/rpl/t/rpl_delayed_slave.test at line 262:
At line 109: Test assertion failed in rpl_assertion.inc
Assertion text: 'Status should be 'Waiting until MASTER_DELAY...''
Assertion condition: '"[SHOW SLAVE STATUS, Slave_SQL_Running_State, 1]" LIKE "Waiting until MASTER_DELAY%"'
Assertion condition, interpolated: '"closing tables" LIKE "Waiting until MASTER_DELAY%"'
Assertion result: '0'

Comment by Andrei Elkin [ 2022-03-01 ]

angelique.sklavounos, let me leave this one for your detailed analysis. It looks very much as the test's own issue.
The failures' arena is here:

256 --let $assert_text= START SLAVE should finish quickly
257 --let $assert_cond= UNIX_TIMESTAMP() - $timestamp_before_stop < $time1
258 --source include/rpl_assert.inc
259
260 --let $query_number= 7
261 --source include/delayed_slave_wait_on_query.inc

and why the test is so sure about the hardcoded 7 is the first thing to clear out.
(Maybe the actual query id is different than the expected one).

Comment by Angelique Sklavounos (Inactive) [ 2022-03-04 ]

and why the test is so sure about the hardcoded 7 is the first thing to clear out.

The test adds rows to t1 that include integer values 0, 1, 2, ..., based on their order of being added. Each entry is separated by a certain sleep amount in order to test that there is a delay in the sync to the replica. The value of 7 is hardcoded here:

228 INSERT INTO t1 VALUES ('stop slave and start slave: DML', 7);

The macro called in line 261 will check that the replica has not synced that row yet, then sleep for a suitable amount of time, after which the row with "'stop slave and start slave: DML', 7" will now appear in the replica.

In the macro, the test never seems to fail on this assertion that queries the number of rows in t1 on the replica, but rather on the replica status.

It should be noted that the test does have a disclaimer of:

# The test is inherently timing-sensitive (i.e., contains races) and is likely to fail sporadically on a loaded host.

and the test usually seems to be retry-pass, but the initial fail does seem to be more than sporadic.

Generated at Thu Feb 08 09:56:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.