Consider the following scenario: I'm creating a new database and end up with something having GTID 0-1-100. Then I take a cold backup of this database, save GTID value 0-1-100 along with it and purge all binlogs. Then I restore several servers from this backup and execute "SET @@global.gtid_slave_pos = '0-1-100'" on all of them. I choose one of these servers to be a master and start to write to it, GTID starts moving. I execute "CHANGE MASTER TO" on all other servers to connect them to master. And all slaves are unable to replicate showing error "The binlog on the master is missing the GTID 0-1-100 requested by the slave (even though both a prior and a subsequent sequence number does exist), and GTID strict mode is enabled".
Note also that before GTID moves on the master in such situation slaves cannot connect to it too because of "out of memory error on the master" (I'd think the real problem is that master doesn't have any events in the binlog).
In current code, this is not implemented. So taking a cold backup without any
binlog files means the new server has no prior knowledge of used GTIDs,
effectively starting over as if RESET MASTER was done.
So with current code it is necessary to include at least one binlog file in
the backup (if executing FLUSH LOGS just before the cold backup, that
file can be made very small though).
The other part is that from the description the error messages in this case
are inaccurate, this should be fixed in any case.
Kristian Nielsen
added a comment - If I understand the description correctly, there are two separate issues here.
One is a feature request like this:
https://lists.launchpad.net/maria-developers/msg05551.html
In current code, this is not implemented. So taking a cold backup without any
binlog files means the new server has no prior knowledge of used GTIDs,
effectively starting over as if RESET MASTER was done.
So with current code it is necessary to include at least one binlog file in
the backup (if executing FLUSH LOGS just before the cold backup, that
file can be made very small though).
The other part is that from the description the error messages in this case
are inaccurate, this should be fixed in any case.
Well, sure server has no prior knowledge of used GTIDs, that's why I execute SET @@global.gtid_slave_pos = '0-1-100'. Isn't that supposed to work? If not why?
Actually the use case can be simplified as the following: I bootstrap a new database and do it without any binlogging in the bootstrap mode, then I copy this new database to several servers. Then I start MariaDB on this copied databases and execute SET @@global.gtid_slave_pos = '0-1-100' because I want binlogging to start from this GTID. At this point if strict mode is turned on on all servers they cannot connect to replicate from one of them. Is this kind of bootstrapping not supposed to work? If yes then what should be changed in this process for it to work properly?
Pavel Ivanov
added a comment - Well, sure server has no prior knowledge of used GTIDs, that's why I execute SET @@global.gtid_slave_pos = '0-1-100'. Isn't that supposed to work? If not why?
Actually the use case can be simplified as the following: I bootstrap a new database and do it without any binlogging in the bootstrap mode, then I copy this new database to several servers. Then I start MariaDB on this copied databases and execute SET @@global.gtid_slave_pos = '0-1-100' because I want binlogging to start from this GTID. At this point if strict mode is turned on on all servers they cannot connect to replicate from one of them. Is this kind of bootstrapping not supposed to work? If yes then what should be changed in this process for it to work properly?
Ok, thanks for the clarification, I had missed the point about setting
gtid_slave_pos to get new master starting from specific GTID point.
So yes, it seems plausible that the error message in gtid strict mode
is incorrect. I will need to set up some test cases and investigate to
understand the details.
Not much has been done so far to handle removal of binlogs. But
as you say, having a slave without binlog, and promoting that to a
new master, is supposed to work; and that is quite similar to
what you describe.
Kristian Nielsen
added a comment - Ok, thanks for the clarification, I had missed the point about setting
gtid_slave_pos to get new master starting from specific GTID point.
So yes, it seems plausible that the error message in gtid strict mode
is incorrect. I will need to set up some test cases and investigate to
understand the details.
Not much has been done so far to handle removal of binlogs. But
as you say, having a slave without binlog, and promoting that to a
new master, is supposed to work; and that is quite similar to
what you describe.
Perhaps the problem is the code that detects when slave requests
to start in a "hole" in the master's binlog with gtid strict mode enabled
(0-1-99 and 0-1-101 exist but 0-1-100 does not). It needs a special
case for when 0-1-100 is not in the binlogs but is in the
@@gtid_slave_pos.
Kristian Nielsen
added a comment - Perhaps the problem is the code that detects when slave requests
to start in a "hole" in the master's binlog with gtid strict mode enabled
(0-1-99 and 0-1-101 exist but 0-1-100 does not). It needs a special
case for when 0-1-100 is not in the binlogs but is in the
@@gtid_slave_pos.
Do you think a special case when 0-1-100 is not in binlogs but it was in gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?
Pavel Ivanov
added a comment - Do you think a special case when 0-1-100 is not in binlogs but it was in gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?
>Do you think a special case when 0-1-100 is not in binlogs but it was in
> gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?
Yes.
Kristian Nielsen
added a comment - >Do you think a special case when 0-1-100 is not in binlogs but it was in
> gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?
Yes.
Kristian, could you say what's your priority on this? Is there an ETA?
We really need this and I'm starting to think that maybe I need to hack something about this myself...
Pavel Ivanov
added a comment - Kristian, could you say what's your priority on this? Is there an ETA?
We really need this and I'm starting to think that maybe I need to hack something about this myself...
(There were two separate bugs. One the error in --gtid-strict-mode
and a different bug in the case of empty binlogs on the newly
provisioned master, as noted by the reporter in the original
description).
Kristian Nielsen
added a comment - Fix pushed to 10.0-base.
(There were two separate bugs. One the error in --gtid-strict-mode
and a different bug in the case of empty binlogs on the newly
provisioned master, as noted by the reporter in the original
description).
If I understand the description correctly, there are two separate issues here.
One is a feature request like this:
https://lists.launchpad.net/maria-developers/msg05551.html
In current code, this is not implemented. So taking a cold backup without any
binlog files means the new server has no prior knowledge of used GTIDs,
effectively starting over as if RESET MASTER was done.
So with current code it is necessary to include at least one binlog file in
the backup (if executing FLUSH LOGS just before the cold backup, that
file can be made very small though).
The other part is that from the description the error messages in this case
are inaccurate, this should be fixed in any case.