Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6903

[PATCH] gtid_slave_pos is incorrect after master crash

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.0.14
    • 10.0.16
    • Replication
    • None

    Description

      In the situation of master crash discussed in https://mariadb.atlassian.net/browse/MDEV-6462 slave can recover now in terms that the partial transaction is rolled back and SQL thread execution position points to the same place in master's binlog as IO thread current position. But gtid_slave_pos is increased and contains GTID of the transaction that was never committed. Besides the confusion of what gtid_current_pos points to in such situation it can also lead to slaves skipping one transaction from master's binlog when writes continue to restarted master.

      I'm attaching the one-line patch that fixes the problem along with extension to tests added in MDEV-6462 to check correctness of gtid_slave_pos. The patch also has one more test case that was showing transaction loss without the code fix. The patch is on top of latest 10.0 branch.

      Note that the problem feels like https://mariadb.atlassian.net/browse/MDEV-4906 wasn't fixed fully. And the patch makes me wonder: should gtid_sub_id be actually reset to 0 inside cleanup_context()? Maybe without that there could be some other situations when gtid_slave_pos is wrong and has GTID of a rolled back transaction?

      Attachments

        Activity

          pivanof Pavel Ivanov created issue -
          elenst Elena Stepanova made changes -
          Field Original Value New Value
          Fix Version/s 10.0 [ 16000 ]
          Assignee Kristian Nielsen [ knielsen ]
          Summary gtid_slave_pos is incorrect after master crash [PATCH] gtid_slave_pos is incorrect after master crash
          elenst Elena Stepanova made changes -
          Component/s Replication [ 10100 ]

          Thanks for tracking this down!

          I share your concern wrt. possible other similar situations. I will check this in the code before applying the fix.

          knielsen Kristian Nielsen added a comment - Thanks for tracking this down! I share your concern wrt. possible other similar situations. I will check this in the code before applying the fix.
          pivanof Pavel Ivanov made changes -
          Attachment fix_wrong_gtid_slave_pos.txt [ 34904 ]
          pivanof Pavel Ivanov added a comment -

          The patch I've attached before had test that was flaky and didn't reproduce the transaction loss on the slave consistently. Attaching the new one with the corrected test.

          pivanof Pavel Ivanov added a comment - The patch I've attached before had test that was flaky and didn't reproduce the transaction loss on the slave consistently. Attaching the new one with the corrected test.
          pivanof Pavel Ivanov made changes -
          Attachment fix_wrong_gtid_slave_pos.txt [ 35100 ]
          knielsen Kristian Nielsen made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          knielsen Kristian Nielsen made changes -
          Status In Progress [ 3 ] Stalled [ 10000 ]
          knielsen Kristian Nielsen made changes -
          Status Stalled [ 10000 ] In Progress [ 3 ]

          Pushed to 10.0.16, thanks Pavel for tracking it down and for the test case!

          knielsen Kristian Nielsen added a comment - Pushed to 10.0.16, thanks Pavel for tracking it down and for the test case!
          knielsen Kristian Nielsen made changes -
          Fix Version/s 10.0.16 [ 17900 ]
          Fix Version/s 10.0 [ 16000 ]
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 55908 ] MariaDB v3 [ 64831 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 64831 ] MariaDB v4 [ 148326 ]

          People

            knielsen Kristian Nielsen
            pivanof Pavel Ivanov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.