Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4708

GTID strict mode doesn't work on a database with purged binlogs

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.3
    • None
    • None
    • None

    Description

      Consider the following scenario: I'm creating a new database and end up with something having GTID 0-1-100. Then I take a cold backup of this database, save GTID value 0-1-100 along with it and purge all binlogs. Then I restore several servers from this backup and execute "SET @@global.gtid_slave_pos = '0-1-100'" on all of them. I choose one of these servers to be a master and start to write to it, GTID starts moving. I execute "CHANGE MASTER TO" on all other servers to connect them to master. And all slaves are unable to replicate showing error "The binlog on the master is missing the GTID 0-1-100 requested by the slave (even though both a prior and a subsequent sequence number does exist), and GTID strict mode is enabled".

      Note also that before GTID moves on the master in such situation slaves cannot connect to it too because of "out of memory error on the master" (I'd think the real problem is that master doesn't have any events in the binlog).

      Attachments

        Activity

          pivanof Pavel Ivanov created issue -
          elenst Elena Stepanova made changes -
          Field Original Value New Value
          Assignee Kristian Nielsen [ knielsen ]

          If I understand the description correctly, there are two separate issues here.

          One is a feature request like this:

          https://lists.launchpad.net/maria-developers/msg05551.html

          In current code, this is not implemented. So taking a cold backup without any
          binlog files means the new server has no prior knowledge of used GTIDs,
          effectively starting over as if RESET MASTER was done.

          So with current code it is necessary to include at least one binlog file in
          the backup (if executing FLUSH LOGS just before the cold backup, that
          file can be made very small though).

          The other part is that from the description the error messages in this case
          are inaccurate, this should be fixed in any case.

          knielsen Kristian Nielsen added a comment - If I understand the description correctly, there are two separate issues here. One is a feature request like this: https://lists.launchpad.net/maria-developers/msg05551.html In current code, this is not implemented. So taking a cold backup without any binlog files means the new server has no prior knowledge of used GTIDs, effectively starting over as if RESET MASTER was done. So with current code it is necessary to include at least one binlog file in the backup (if executing FLUSH LOGS just before the cold backup, that file can be made very small though). The other part is that from the description the error messages in this case are inaccurate, this should be fixed in any case.
          pivanof Pavel Ivanov added a comment -

          Well, sure server has no prior knowledge of used GTIDs, that's why I execute SET @@global.gtid_slave_pos = '0-1-100'. Isn't that supposed to work? If not why?

          Actually the use case can be simplified as the following: I bootstrap a new database and do it without any binlogging in the bootstrap mode, then I copy this new database to several servers. Then I start MariaDB on this copied databases and execute SET @@global.gtid_slave_pos = '0-1-100' because I want binlogging to start from this GTID. At this point if strict mode is turned on on all servers they cannot connect to replicate from one of them. Is this kind of bootstrapping not supposed to work? If yes then what should be changed in this process for it to work properly?

          pivanof Pavel Ivanov added a comment - Well, sure server has no prior knowledge of used GTIDs, that's why I execute SET @@global.gtid_slave_pos = '0-1-100'. Isn't that supposed to work? If not why? Actually the use case can be simplified as the following: I bootstrap a new database and do it without any binlogging in the bootstrap mode, then I copy this new database to several servers. Then I start MariaDB on this copied databases and execute SET @@global.gtid_slave_pos = '0-1-100' because I want binlogging to start from this GTID. At this point if strict mode is turned on on all servers they cannot connect to replicate from one of them. Is this kind of bootstrapping not supposed to work? If yes then what should be changed in this process for it to work properly?

          Ok, thanks for the clarification, I had missed the point about setting
          gtid_slave_pos to get new master starting from specific GTID point.

          So yes, it seems plausible that the error message in gtid strict mode
          is incorrect. I will need to set up some test cases and investigate to
          understand the details.

          Not much has been done so far to handle removal of binlogs. But
          as you say, having a slave without binlog, and promoting that to a
          new master, is supposed to work; and that is quite similar to
          what you describe.

          knielsen Kristian Nielsen added a comment - Ok, thanks for the clarification, I had missed the point about setting gtid_slave_pos to get new master starting from specific GTID point. So yes, it seems plausible that the error message in gtid strict mode is incorrect. I will need to set up some test cases and investigate to understand the details. Not much has been done so far to handle removal of binlogs. But as you say, having a slave without binlog, and promoting that to a new master, is supposed to work; and that is quite similar to what you describe.

          Perhaps the problem is the code that detects when slave requests
          to start in a "hole" in the master's binlog with gtid strict mode enabled
          (0-1-99 and 0-1-101 exist but 0-1-100 does not). It needs a special
          case for when 0-1-100 is not in the binlogs but is in the
          @@gtid_slave_pos.

          knielsen Kristian Nielsen added a comment - Perhaps the problem is the code that detects when slave requests to start in a "hole" in the master's binlog with gtid strict mode enabled (0-1-99 and 0-1-101 exist but 0-1-100 does not). It needs a special case for when 0-1-100 is not in the binlogs but is in the @@gtid_slave_pos.
          pivanof Pavel Ivanov added a comment -

          Do you think a special case when 0-1-100 is not in binlogs but it was in gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?

          pivanof Pavel Ivanov added a comment - Do you think a special case when 0-1-100 is not in binlogs but it was in gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?

          >Do you think a special case when 0-1-100 is not in binlogs but it was in
          > gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed?

          Yes.

          knielsen Kristian Nielsen added a comment - >Do you think a special case when 0-1-100 is not in binlogs but it was in > gtid_slave_pos and binlogs have 0-2-101 and higher should also be processed? Yes.
          pivanof Pavel Ivanov added a comment -

          Kristian, could you say what's your priority on this? Is there an ETA?

          We really need this and I'm starting to think that maybe I need to hack something about this myself...

          pivanof Pavel Ivanov added a comment - Kristian, could you say what's your priority on this? Is there an ETA? We really need this and I'm starting to think that maybe I need to hack something about this myself...
          knielsen Kristian Nielsen made changes -
          Status Open [ 1 ] In Progress [ 3 ]

          Fix pushed to 10.0-base.

          (There were two separate bugs. One the error in --gtid-strict-mode
          and a different bug in the case of empty binlogs on the newly
          provisioned master, as noted by the reporter in the original
          description).

          knielsen Kristian Nielsen added a comment - Fix pushed to 10.0-base. (There were two separate bugs. One the error in --gtid-strict-mode and a different bug in the case of empty binlogs on the newly provisioned master, as noted by the reporter in the original description).
          knielsen Kristian Nielsen made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow defaullt [ 27751 ] MariaDB v2 [ 46649 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 46649 ] MariaDB v3 [ 67365 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 67365 ] MariaDB v4 [ 146806 ]

          People

            knielsen Kristian Nielsen
            pivanof Pavel Ivanov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.