[MDEV-4933] Misleading error message upon a wrong sequence of events in the master log Created: 2013-08-21  Updated: 2014-09-03  Resolved: 2014-09-03

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.10
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: gtid

Issue Links:
Relates
relates to MDEV-26 Global transaction ID Closed

 Description   

There was a complaint about this message on the maria-developers list, but I don't keep track of what has been fixed and what not from that discussion, so I'll file it just in case. Please feel free to discard if it's not needed.

When a slave operating in a strict GTID mode encounters a wrong sequence of events, it produces the error message:
"The binlog on the master is missing the GTID 0-1-8 requested by the slave (even though both a prior and a subsequent sequence number does exist), and GTID strict mode is enabled".
Sometimes the message is wrong. In the example below, the GTID 0-1-8 exists in the binary log.

# Error message:
Got fatal error 1236 from master when reading data from binary log: 'The binlog on the master is missing the GTID 0-1-8 requested by the slave (even though both a prior and a subsequent sequence number does exist), and GTID strict mode is enabled'
connection server_1;
# Actual binlog events on the master:
show binlog events;
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	4	Format_desc	1	248	Server ver: 10.0.3-MariaDB-debug-log, Binlog ver: 4
master-bin.000001	248	Gtid_list	1	273	[]
master-bin.000001	273	Binlog_checkpoint	1	313	master-bin.000001
master-bin.000001	313	Gtid	1	351	GTID 0-1-1
master-bin.000001	351	Query	1	451	use `test`; create table t1 (i int) engine=InnoDB
master-bin.000001	451	Gtid	1	489	BEGIN GTID 0-1-2
master-bin.000001	489	Query	1	581	use `test`; insert into t1 values (1),(2)
master-bin.000001	581	Xid	1	608	COMMIT /* xid=104 */
master-bin.000001	608	Gtid	1	646	BEGIN GTID 0-1-10
master-bin.000001	646	Query	1	738	use `test`; insert into t1 values (3),(4)
master-bin.000001	738	Xid	1	765	COMMIT /* xid=106 */
master-bin.000001	765	Gtid	1	803	BEGIN GTID 0-1-8
master-bin.000001	803	Query	1	895	use `test`; insert into t1 values (5),(6)
master-bin.000001	895	Xid	1	922	COMMIT /* xid=108 */

The error itself is expected, because the sequence numbers are not monotonically increasing, it's just the diagnostics that is questionable here.

gtid1.test

--source include/have_innodb.inc
--let $rpl_topology=1->2
--source include/rpl_init.inc
 
--connection server_2
--source include/stop_slave.inc
change master to master_use_gtid=current_pos;
--source include/start_slave.inc
 
--enable_connect_log
 
--connection server_1
create table t1 (i int) engine=InnoDB;
insert into t1 values (1),(2);
set gtid_seq_no=10;
insert into t1 values (3),(4);
set gtid_seq_no=8;
insert into t1 values (5),(6);
--save_master_pos
 
--connection server_2
--sync_with_master
--source include/stop_slave.inc
set global gtid_strict_mode=1;
#--source include/start_slave.inc
start slave;
--let $slave_io_errno= 1236
--source include/wait_for_slave_io_error.inc
 
--let $err_text = query_get_value(SHOW SLAVE STATUS, Last_IO_Error, 1)
 
--echo # Error message:
--echo $err_text
 
--connection server_1
--echo # Actual binlog events on the master:
show binlog events;

gtid1.cnf

!include suite/rpl/rpl_1slave_base.cnf
!include include/default_client.cnf
 
[mysqld.1]
log-slave-updates
 
[mysqld.2]
log-slave-updates

bzr version-info

revision-id: sanja@montyprogram.com-20130820135351-6wjeq3u31waunp7c
revno: 3680
branch-nick: 10.0-base



 Comments   
Comment by Kristian Nielsen [ 2014-02-11 ]

Yeah, I wonder how to word that error message in a concise yet clear and
correct way.

What actually happens is that the master is first scanning the list of
binlogs, looking for the most recent that has a smaller seq_no in the last
GTID in that file with same domain and server_id.

Then it scans the binlog file, looking for the requested GTID, and aborts with
this error if it instead finds one with higher seq_no and same
domain/server_id, as this indicates a "hole" in the binlog (or severe
out-of-order as in your example).

Would this be a better error message?

"ERROR: While scanning binlogs for slave start position, found GTID 0-1-10
instead of GTID 0-1-8 that slave requested, and GTID strict mode is
enabled."

Generated at Thu Feb 08 07:00:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.