[MDEV-4708] GTID strict mode doesn't work on a database with purged binlogs - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0.3
Fix Version/s: None
Component/s: None
Labels:
None

Description

Consider the following scenario: I'm creating a new database and end up with something having GTID 0-1-100. Then I take a cold backup of this database, save GTID value 0-1-100 along with it and purge all binlogs. Then I restore several servers from this backup and execute "SET @@global.gtid_slave_pos = '0-1-100'" on all of them. I choose one of these servers to be a master and start to write to it, GTID starts moving. I execute "CHANGE MASTER TO" on all other servers to connect them to master. And all slaves are unable to replicate showing error "The binlog on the master is missing the GTID 0-1-100 requested by the slave (even though both a prior and a subsequent sequence number does exist), and GTID strict mode is enabled".

Note also that before GTID moves on the master in such situation slaves cannot connect to it too because of "out of memory error on the master" (I'd think the real problem is that master doesn't have any events in the binlog).

Attachments

Activity

Ascending order - Click to sort in descending order

Pavel Ivanov created issue - 2013-06-25 09:46

Elena Stepanova made changes - 2013-06-25 21:36

Field	Original Value	New Value
Assignee		Kristian Nielsen [ knielsen ]

Kristian Nielsen added a comment - 2013-06-26 09:17

If I understand the description correctly, there are two separate issues here.

One is a feature request like this:

https://lists.launchpad.net/maria-developers/msg05551.html

In current code, this is not implemented. So taking a cold backup without any
binlog files means the new server has no prior knowledge of used GTIDs,
effectively starting over as if RESET MASTER was done.

So with current code it is necessary to include at least one binlog file in
the backup (if executing FLUSH LOGS just before the cold backup, that
file can be made very small though).

The other part is that from the description the error messages in this case
are inaccurate, this should be fixed in any case.

Kristian Nielsen added a comment - 2013-06-26 09:17 If I understand the description correctly, there are two separate issues here. One is a feature request like this: https://lists.launchpad.net/maria-developers/msg05551.html In current code, this is not implemented. So taking a cold backup without any binlog files means the new server has no prior knowledge of used GTIDs, effectively starting over as if RESET MASTER was done. So with current code it is necessary to include at least one binlog file in the backup (if executing FLUSH LOGS just before the cold backup, that file can be made very small though). The other part is that from the description the error messages in this case are inaccurate, this should be fixed in any case.

Pavel Ivanov added a comment - 2013-06-26 09:36

Well, sure server has no prior knowledge of used GTIDs, that's why I execute SET @@global.gtid_slave_pos = '0-1-100'. Isn't that supposed to work? If not why?

Actually the use case can be simplified as the following: I bootstrap a new database and do it without any binlogging in the bootstrap mode, then I copy this new database to several servers. Then I start MariaDB on this copied databases and execute SET @@global.gtid_slave_pos = '0-1-100' because I want binlogging to start from this GTID. At this point if strict mode is turned on on all servers they cannot connect to replicate from one of them. Is this kind of bootstrapping not supposed to work? If yes then what should be changed in this process for it to work properly?

Pavel Ivanov added a comment - 2013-06-26 09:36 Well, sure server has no prior knowledge of used GTIDs, that's why I execute SET @@global.gtid_slave_pos = '0-1-100'. Isn't that supposed to work? If not why? Actually the use case can be simplified as the following: I bootstrap a new database and do it without any binlogging in the bootstrap mode, then I copy this new database to several servers. Then I start MariaDB on this copied databases and execute SET @@global.gtid_slave_pos = '0-1-100' because I want binlogging to start from this GTID. At this point if strict mode is turned on on all servers they cannot connect to replicate from one of them. Is this kind of bootstrapping not supposed to work? If yes then what should be changed in this process for it to work properly?

Kristian Nielsen added a comment - 2013-06-26 09:52

Ok, thanks for the clarification, I had missed the point about setting
gtid_slave_pos to get new master starting from specific GTID point.

So yes, it seems plausible that the error message in gtid strict mode
is incorrect. I will need to set up some test cases and investigate to
understand the details.

Not much has been done so far to handle removal of binlogs. But
as you say, having a slave without binlog, and promoting that to a
new master, is supposed to work; and that is quite similar to
what you describe.

Kristian Nielsen added a comment - 2013-06-26 09:52 Ok, thanks for the clarification, I had missed the point about setting gtid_slave_pos to get new master starting from specific GTID point. So yes, it seems plausible that the error message in gtid strict mode is incorrect. I will need to set up some test cases and investigate to understand the details. Not much has been done so far to handle removal of binlogs. But as you say, having a slave without binlog, and promoting that to a new master, is supposed to work; and that is quite similar to what you describe.

Kristian Nielsen added a comment - 2013-06-26 09:55

Perhaps the problem is the code that detects when slave requests
to start in a "hole" in the master's binlog with gtid strict mode enabled
(0-1-99 and 0-1-101 exist but 0-1-100 does not). It needs a special
case for when 0-1-100 is not in the binlogs but is in the
@@gtid_slave_pos.

Kristian Nielsen added a comment - 2013-06-26 09:55 Perhaps the problem is the code that detects when slave requests to start in a "hole" in the master's binlog with gtid strict mode enabled (0-1-99 and 0-1-101 exist but 0-1-100 does not). It needs a special case for when 0-1-100 is not in the binlogs but is in the @@gtid_slave_pos.

Pavel Ivanov added a comment - 2013-06-26 10:00

Do you think a special case when 0-1-100 is not in binlogs but it was in gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?

Pavel Ivanov added a comment - 2013-06-26 10:00 Do you think a special case when 0-1-100 is not in binlogs but it was in gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?

Kristian Nielsen added a comment - 2013-06-26 10:46

>Do you think a special case when 0-1-100 is not in binlogs but it was in
> gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?

Yes.

Kristian Nielsen added a comment - 2013-06-26 10:46 >Do you think a special case when 0-1-100 is not in binlogs but it was in > gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed? Yes.

Pavel Ivanov added a comment - 2013-07-10 08:57

Kristian, could you say what's your priority on this? Is there an ETA?

We really need this and I'm starting to think that maybe I need to hack something about this myself...

Pavel Ivanov added a comment - 2013-07-10 08:57 Kristian, could you say what's your priority on this? Is there an ETA? We really need this and I'm starting to think that maybe I need to hack something about this myself...

Kristian Nielsen made changes - 2013-07-10 13:03

Status

Open [ 1 ]

In Progress [ 3 ]

Kristian Nielsen added a comment - 2013-07-10 13:04

Fix pushed to 10.0-base.

(There were two separate bugs. One the error in --gtid-strict-mode
and a different bug in the case of empty binlogs on the newly
provisioned master, as noted by the reporter in the original
description).

Kristian Nielsen added a comment - 2013-07-10 13:04 Fix pushed to 10.0-base. (There were two separate bugs. One the error in --gtid-strict-mode and a different bug in the case of empty binlogs on the newly provisioned master, as noted by the reporter in the original description).

Kristian Nielsen made changes - 2013-07-10 13:04

Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2014-06-13 15:07

Workflow

defaullt [ 27751 ]

MariaDB v2 [ 46649 ]

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:51

Workflow

MariaDB v2 [ 46649 ]

MariaDB v3 [ 67365 ]

Sergei Golubchik made changes - 2021-12-06 21:38

Workflow

MariaDB v3 [ 67365 ]

MariaDB v4 [ 146806 ]

People

Assignee:: Kristian Nielsen

Reporter:: Pavel Ivanov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2013-06-25 09:46

Updated:: 2013-07-10 13:04

Resolved:: 2013-07-10 13:04

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Activity

People

Dates

Git Integration