[MDEV-14528] Track master timestamp in case rolling back to serial replication Created: 2017-11-29  Updated: 2018-11-13  Resolved: 2018-11-07

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2, 10.3
Fix Version/s: 10.3.11, 10.2.19

Type: Bug Priority: Blocker
Reporter: Sergey Vojtovich Assignee: Andrei Elkin
Resolution: Fixed Votes: 0
Labels: contribution, foundation, seconds-behind-master


 Description   

I tried to switch on parallel replication from Amazon Aurora to MariaDB.

[my.cnf]
slave-parallel-threads = 4
slave-parallel-mode = optimistic

'set global .slave_parallel_mode=optimistic;'
As I understand now it doesn't have sense because it's rolling back to serial replication. But it introduces a silent problem with Seconds_behind_master. This value always = 0.

So, probably, would be good to keep this metric even in this unnatural but working mode.



 Comments   
Comment by Sergey Vojtovich [ 2017-11-29 ]

Target version is up to Elkin to decide.

Comment by Sergey Vojtovich [ 2018-04-06 ]

Overdue PR.

Comment by Andrei Elkin [ 2018-11-06 ]

Analyzed the issue;
reviewed the patch to slightly improve and add up a test part.

Comment by Marko Mäkelä [ 2018-11-07 ]

This change is causing one test failure:

10.2 54b8856b87629e9fec075e3a71179eefc7fa02ac

rpl.rpl_delayed_slave 'parallel,stmt'    [ fail ]
        Test ended at 2018-11-07 09:34:46
 
CURRENT_TEST: rpl.rpl_delayed_slave
mysqltest: In included file "./include/rpl_assert.inc": 
included from /mariadb/10.2/mysql-test/suite/rpl/t/rpl_delayed_slave.test at line 203:
At line 109: Test assertion failed in rpl_assertion.inc
 
The result from queries just before the failure was:
< snip >
master-bin.000001	2009	Query	1	2082	COMMIT
master-bin.000001	2082	Gtid	1	2124	BEGIN GTID 0-1-8
master-bin.000001	2124	Query	1	2233	use `test`; INSERT INTO t1 SELECT delay_on_slave(2), 4
master-bin.000001	2233	Query	1	2306	COMMIT
master-bin.000001	2306	Gtid	1	2348	BEGIN GTID 0-1-9
master-bin.000001	2348	Query	1	2457	use `test`; INSERT INTO t1 VALUES ('Syncing slave', 5)
master-bin.000001	2457	Query	1	2530	COMMIT
master-bin.000001	2530	Gtid	1	2572	BEGIN GTID 0-1-10
master-bin.000001	2572	Query	1	2683	use `test`; INSERT INTO t1 VALUES (delay_on_slave(1), 6)
master-bin.000001	2683	Query	1	2756	COMMIT
 
**** SHOW RELAYLOG EVENTS on server_1 ****
relaylog_name = 'No such row'
SHOW RELAYLOG EVENTS IN 'No such row';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
connection slave;
Assertion text: 'Seconds_Behind_Master should be between 0 and the 2*T'
Assertion condition: '[SHOW SLAVE STATUS, Seconds_Behind_Master, 1] >= 0 AND <1> < 20'
Assertion condition, interpolated: '111 >= 0 AND 111 < 20'
Assertion result: '0'

Comment by Marko Mäkelä [ 2018-11-07 ]

I disabled the failing test.
I think that this failure should be analyzed before the releases. Is it a problem with the test only, or should the logic be revised?

Comment by Andrei Elkin [ 2018-11-07 ]

Analyzed rpl_delayed_slave failure to conclude the main patch needs a refinement;
implemented; tested.

Generated at Thu Feb 08 08:14:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.