Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.4(EOL), 10.5, 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4
-
None
Description
This is an odd replication issue. The main error produced is this:
11.3.2 63fb478f88e0061d149f5cdd3c4d21d4a35c7bd9 (Debug) |
[ERROR] Error reading packet from server: The binlog on the master is missing the GTID 0-1-3 requested by the slave (even though both a prior and a subsequent sequence number does exist), and GTID strict mode is enabled (server_errno=1236)
|
After a RESET MASTER (which cannot be correct).
What is additionally odd is how the testcase behaves:
The issue (originally seen twice in, but apparently not related to, MDEV-4991) readily reduced to a small SQL file (attached as MDEV-33445.sql), and then accepted some manual cleanup. However, any further editing of the input SQL resulted/results in non-reproducibility, indicating that the length (or somewhere failing syntax) is significant.
This is especially true for the 3rd (CREATE TABLE t1...) line where removal of the EOL comment results in non-reproducibility.
Furthermore, the issue can only be replayed using the pquery client: all CLI and MTR attempts fail. The issue is not sporadic.
Nothing special is required on the master (--no-defaults --log_bin=binlog --server_id=1) however gtid_strict_mode is required on the slave (i.e. --no-defaults --gtid_strict_mode=1 --server_id=2).
The issue reproduces on a 11.3 debug build from 27 Deb 23, indicating it is not related to MDEV-4991. However, a recent (6 Feb 24) 11.3 optimized build does not reproduce the issue.
Other versions may be affected also.
The - possibly concerning - bug here is this part of the error:
even though both a prior and a subsequent sequence number does exist
|
This cannot be correct given the RESET MASTER. Even if the file was still in use, then the former part (missing the GTID 0-1-3 requested) cannot be correct in combination with "a subsequent sequence number does exist".
It could be that the error message is simply incorrect, but the code needs checking as the bug could be more serious. The testcase length or syntax oddity also needs clarification.
I can reproduce the issue readily on my end, so when a patch is available I can retest.