Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4473

GTID doesn't work in face of master failovers

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.0.2
    • 10.0.3
    • None
    • None

    Description

      I've attached the test exposing the problem.

      What test does: sets up rpl topology 1->2,1->3, executes some statements, shuts down server 3, performs failover to make topology 2->1, executes some more statements, rotates binlogs, connects server 3 as a slave to 2. At this point server 2 hits ASSERT at sql/sql_repl.cc:1001.

      The problem is GTID implementation has an inconsistent view of what is replication domain and what data constitutes the replication domain state. E.g. function contains_all_slave_gtid() assumes that Gtid_list contains only one gtid by domain yet compares seq_no only if gtid contains the same server_id as requested by slave which is wrong. And this "wrong" is what I wanted to catch with my test but discovered that Gtid_list actually contains one gtid for each domain-server pair which seems to be even more wrong – for how long non-existent old server_id will be stored and passed from one Gtid_list to another on each binlog rotation?

      mysqld hits ASSERT in the test because gtid_find_binlog_file() makes the same assumptions as contains_all_slave_gtid() but is written in a way prohibiting any work if domain_id is repeated in Gtid_list.

      I think the fix should be to remove server_id comparisons from the places like gtid_find_binlog_file() and contains_all_slave_gtid() and make sure that Gtid_list contains only one gtid per domain. After all server_id exists only for de-duplication of events with the same seq_no belonging to alternate futures.

      Attachments

        Activity

          Hm, that assert is wrong:

          const rpl_gtid *gtid= state->find(glev->list[i].domain_id);
          if (!gtid)

          { /* contains_all_slave_gtid() would have returned false if so. */ DBUG_ASSERT(0); continue; }

          It is correct in the first iteration of the loop, but as we delete from the
          `state' hash in the loop, the condition may no longer hold in subsequent
          iterations of the loop. So the assert should just be removed.

          I will check that everything else is ok with the supplied test case (thanks
          for supplying that, made it very easy to reproduce the problem).

          knielsen Kristian Nielsen added a comment - Hm, that assert is wrong: const rpl_gtid *gtid= state->find(glev->list [i] .domain_id); if (!gtid) { /* contains_all_slave_gtid() would have returned false if so. */ DBUG_ASSERT(0); continue; } It is correct in the first iteration of the loop, but as we delete from the `state' hash in the loop, the condition may no longer hold in subsequent iterations of the loop. So the assert should just be removed. I will check that everything else is ok with the supplied test case (thanks for supplying that, made it very easy to reproduce the problem).
          pivanof Pavel Ivanov added a comment -

          Note though contains_all_slave_gtid() returns true for slave_log.00003 when server 3 tries to connect to server 2 which is wrong because some events become skipped in slave_log.00002.

          pivanof Pavel Ivanov added a comment - Note though contains_all_slave_gtid() returns true for slave_log.00003 when server 3 tries to connect to server 2 which is wrong because some events become skipped in slave_log.00002.

          Yes, I've noticed, thanks.

          I'll fix.

          knielsen Kristian Nielsen added a comment - Yes, I've noticed, thanks. I'll fix.

          BTW, I noticed that your test has --sync_with_master in server_3, but not server_2 (for the initial events on server 1). I assume this is just an oversight, because without it the test becomes non-deterministic (it will fail if server 2 slave threads happen to stop before they replicated everything from server 1)

          knielsen Kristian Nielsen added a comment - BTW, I noticed that your test has --sync_with_master in server_3, but not server_2 (for the initial events on server 1). I assume this is just an oversight, because without it the test becomes non-deterministic (it will fail if server 2 slave threads happen to stop before they replicated everything from server 1)

          Ok, so there was a deeper issue here.

          Suppose binlog file X has in its Gtid_list_event: 0-1-3,0-2-5, and suppose the
          slave requests to start replicating after 0-1-3.

          In this case the bug was that master would start sending events from the start
          of X. This is wrong, because 0-2-4 and 0-2-5 are contained in X-1, and are
          needed by the slave. So these events were lost.

          On the other hand, if the slave requested 0-2-5, then it is correct to start
          sending from the beginning of binlog file X, because 0-2-5 is the last GTID
          logged in earlier binlogs. The difference is that 0-2-5 is the last of the
          GTIDs in the Gtid_list_event. The problem was that the code did not check that
          the matched GTID was the last one in the list.

          Fixed by checking if the gtid requested by slave that matches a gtid in the
          Gtid_list_event is the last event for that domain in the list. If not, go back
          to a prior binlog to ensure all needed events are sent to slave.

          I pushed the fix to 10.0-base.
          Patch: revid:knielsen@knielsen-hq.org-20130503092729-gedp152b19k5wdnj

          knielsen Kristian Nielsen added a comment - Ok, so there was a deeper issue here. Suppose binlog file X has in its Gtid_list_event: 0-1-3,0-2-5, and suppose the slave requests to start replicating after 0-1-3. In this case the bug was that master would start sending events from the start of X. This is wrong, because 0-2-4 and 0-2-5 are contained in X-1, and are needed by the slave. So these events were lost. On the other hand, if the slave requested 0-2-5, then it is correct to start sending from the beginning of binlog file X, because 0-2-5 is the last GTID logged in earlier binlogs. The difference is that 0-2-5 is the last of the GTIDs in the Gtid_list_event. The problem was that the code did not check that the matched GTID was the last one in the list. Fixed by checking if the gtid requested by slave that matches a gtid in the Gtid_list_event is the last event for that domain in the list. If not, go back to a prior binlog to ensure all needed events are sent to slave. I pushed the fix to 10.0-base. Patch: revid:knielsen@knielsen-hq.org-20130503092729-gedp152b19k5wdnj
          pivanof Pavel Ivanov added a comment -

          Kristian,
          So why do you think it's necessary to store gtid for each server_id in gtid_list? Why not store one gtid per domain?

          And am I right that if some server was master, then failed over to another master and then ceased to exist altogether gtid with its server_id will exist indefinitely in the gtid_list in the binlogs of every server? Don't you think it's a problem and a way to accumulate a lot of unnecessary garbage? How will everything behave if gtid_list contains 1000 or 10,000 different gtid records?

          pivanof Pavel Ivanov added a comment - Kristian, So why do you think it's necessary to store gtid for each server_id in gtid_list? Why not store one gtid per domain? And am I right that if some server was master, then failed over to another master and then ceased to exist altogether gtid with its server_id will exist indefinitely in the gtid_list in the binlogs of every server? Don't you think it's a problem and a way to accumulate a lot of unnecessary garbage? How will everything behave if gtid_list contains 1000 or 10,000 different gtid records?
          pivanof Pavel Ivanov added a comment -

          OK, I see your response on maria-developers list. I'll move the discussions there.

          pivanof Pavel Ivanov added a comment - OK, I see your response on maria-developers list. I'll move the discussions there.

          People

            knielsen Kristian Nielsen
            pivanof Pavel Ivanov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.