Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17156

Local transactions on a Slave don't update GTID's gtid_current_pos after RESET MASTER on Slave (master_use_gtid value is not relevant)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.1.33
    • N/A
    • Replication
    • None

    Description

      Server A = Master id=10133
      Server B = Slave id=20133

      {{MASTER_USE_GTID = slave_pos|current_pos }} (not relevant)

      Writing transactions on both A and B properly updates the global variable gtid_current_pos on the Slave (B).

      If I issue RESET MASTER on the Slave gtid_current_pos on the Slave stops being updated.

      I am aware of Kristian's comment on https://jira.mariadb.org/browse/MDEV-10279 but I still am not convinced.

      MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
      +------------------------+------------+
      | Variable_name          | Value      |
      +------------------------+------------+
      | gtid_binlog_pos        |            |
      | gtid_binlog_state      |            |
      | gtid_current_pos       | 0-10133-40 |
      | gtid_domain_id         | 0          |
      | gtid_ignore_duplicates | OFF        |
      | gtid_slave_pos         | 0-10133-40 |
      | gtid_strict_mode       | OFF        |
      +------------------------+------------+
      7 rows in set (0.00 sec)
       
      MariaDB [test]> insert into a values (22);
      Query OK, 1 row affected (0.01 sec)
       
      MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
      +------------------------+------------+
      | Variable_name          | Value      |
      +------------------------+------------+
      | gtid_binlog_pos        | 0-20133-1  |
      | gtid_binlog_state      | 0-20133-1  |
      | gtid_current_pos       | 0-10133-40 |
      | gtid_domain_id         | 0          |
      | gtid_ignore_duplicates | OFF        |
      | gtid_slave_pos         | 0-10133-40 |
      | gtid_strict_mode       | OFF        |
      +------------------------+------------+
      7 rows in set (0.00 sec)
      

      Look what happens when the sequence number of the local transaction becomes higher than the sequence number of the latest transaction applied as Slave (gtid_slave_pos):

      Generate ~20 transactions on the Slave

      MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
      +------------------------+------------+
      | Variable_name          | Value      |
      +------------------------+------------+
      | gtid_binlog_pos        | 0-20133-23 |
      | gtid_binlog_state      | 0-20133-23 |
      | gtid_current_pos       | 0-10133-40 |
      | gtid_domain_id         | 0          |
      | gtid_ignore_duplicates | OFF        |
      | gtid_slave_pos         | 0-10133-40 |
      | gtid_strict_mode       | OFF        |
      +------------------------+------------+
      7 rows in set (0.00 sec)
      

      _Generate other ~20 transactions on the Slave _

      MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
      +------------------------+------------+
      | Variable_name          | Value      |
      +------------------------+------------+
      | gtid_binlog_pos        | 0-20133-40 |
      | gtid_binlog_state      | 0-20133-40 |
      | gtid_current_pos       | 0-10133-40 |
      | gtid_domain_id         | 0          |
      | gtid_ignore_duplicates | OFF        |
      | gtid_slave_pos         | 0-10133-40 |
      | gtid_strict_mode       | OFF        |
      +------------------------+------------+
      7 rows in set (0.00 sec)
      
      

      And one more...

      MariaDB [test]> SHOW GLOBAL VARIABLES LIKE 'gtid%';
      +------------------------+------------+
      | Variable_name          | Value      |
      +------------------------+------------+
      | gtid_binlog_pos        | 0-20133-41 |
      | gtid_binlog_state      | 0-20133-41 |
      | gtid_current_pos       | *0-20133-41* | 
      | gtid_domain_id         | 0          |
      | gtid_ignore_duplicates | OFF        |
      | gtid_slave_pos         | 0-10133-40 |
      | gtid_strict_mode       | OFF        |
      +------------------------+------------+
      7 rows in set (0.01 sec)
      

      Note gtid_current_pos changing value now from 0-10133-40 to 0-20133-41

      To be said that clearing gtid_slave_pos "solves" that but of course it has its consequences.

      From what I see due to the fact that the locally generated transaction has a lower sequence number it's basically ignored until the sequence number surpasses the one contained in gtid_slave_pos

      I don't know if this is an intended behaviour.

      Manual page says: https://mariadb.com/kb/en/library/gtid/#gtid_current_pos

      [1]This variable is the GTID of the last change to the database for each replication domain. Such changes can either be master events (ie. local changes made by user or application), or replicated events originating from another master server.

      [2]For each replication domain, if the server ID of the corresponding GTID in @@gtid_binlog_pos is equal to the servers own server_id, and the sequence number is higher than the corresponding GTID in @@gtid_slave_pos, then the GTID from @@gtid_binlog_pos will be used. Otherwise the GTID from @@gtid_slave_pos will be used for that domain.

      [3]Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave. This value is used as the starting point of replication when a slave is configured with CHANGE MASTER TO master_use_gtid=current_pos.

      [4]The value is read-only, but it is updated whenever a DML or DDL statement is written to the binary log and/or replicated by a slave thread.

      If you read paragraph [2] explains exactly the behaviour, and so it seems documented (even if not clear to me the rationale):
      "and the sequence number is higher than the corresponding GTID in @@gtid_slave_pos" ,

      but paragraph [3] says:
      "Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave."

      Which is not true, gtid_current_pos does not contain the most recent GTID executed on the server, at least not until the sequence number is greater than the one in gtid_slave_pos.

      I don't know if it's just the documentation to be updated or there is something else.

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment - - edited

            claudio.nanni, hello.

            To
            >but paragraph [3] says:
            >"Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave."

            'the most recent' is defined in terms of the gtid sequence number. As the master and slave share the domain (0) the most recent
            was the master's 40 all the way until the slave has generated 41th.
            I hope this clarifies away your confusion.

            Andrei

            PS.

            to
            > {{MASTER_USE_GTID = slave_pos|current_pos }} (not relevant)
            actually when {{ MASTER_USE_GTID = slave_pos }} it really does not matter what slave topped over the mater's replicated gtid. But does matter when the mode is {{ current_pos }}.

            Elkin Andrei Elkin added a comment - - edited claudio.nanni , hello. To >but paragraph [3] says: >"Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave." 'the most recent' is defined in terms of the gtid sequence number. As the master and slave share the domain (0) the most recent was the master's 40 all the way until the slave has generated 41th. I hope this clarifies away your confusion. Andrei PS. to > {{MASTER_USE_GTID = slave_pos|current_pos }} (not relevant) actually when {{ MASTER_USE_GTID = slave_pos }} it really does not matter what slave topped over the mater's replicated gtid. But does matter when the mode is {{ current_pos }}.
            claudio.nanni Claudio Nanni added a comment -

            To
            >but paragraph [3] says:
            >"Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave."
            'the most recent' is defined in terms of the gtid sequence number. As the master and slave share the domain (0) the most recent
            was the master's 40 all the way until the slave has generated 41th.
            I hope this clarifies away your confusion.

            I still think it's wrong.
            Moreover I think this is also due to a flaw in GTID design.
            The slaves are aware of the latest sequence number applied by the Master but no one is aware of the latest sequence number applied by any Slave, in a simple setup, the Master doesn't know (doesn't have a feedback mechanism for it) that the Slave generated a sequence number.
            The fact that the gtid_current_pos only starts being updated when the local sequence number is higher than the latest sequence number arrived from the Master (gtid_slave_pos) does not make sense to me.

            Read again:
            "Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave."

            That variable must be updated by definition, and it makes no sense that it isn't, what's the logic?
            Moreover maybe this Slave won't ever be slave of that Master again.

            Wouldn't it be simpler and more logical to just always update gtid_current_pos with local transactions?

            claudio.nanni Claudio Nanni added a comment - To >but paragraph [3] says: >"Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave." 'the most recent' is defined in terms of the gtid sequence number. As the master and slave share the domain (0) the most recent was the master's 40 all the way until the slave has generated 41th. I hope this clarifies away your confusion. I still think it's wrong. Moreover I think this is also due to a flaw in GTID design. The slaves are aware of the latest sequence number applied by the Master but no one is aware of the latest sequence number applied by any Slave, in a simple setup, the Master doesn't know (doesn't have a feedback mechanism for it) that the Slave generated a sequence number. The fact that the gtid_current_pos only starts being updated when the local sequence number is higher than the latest sequence number arrived from the Master (gtid_slave_pos) does not make sense to me. Read again: "Thus, @@gtid_current_pos contains the most recent GTID executed on the server, whether this was done as a master or as a slave ." That variable must be updated by definition, and it makes no sense that it isn't, what's the logic? Moreover maybe this Slave won't ever be slave of that Master again. Wouldn't it be simpler and more logical to just always update gtid_current_pos with local transactions?
            Elkin Andrei Elkin added a comment - - edited

            gtid_current_pos is updated this way for the reason that it holds 'the most recent' value in the logical time sense.
            So it remained 40 until the slave server itself (as a "master") generated 41 and then the clock steps.
            Gtid - current_pos and slave_pos are clocks really, so they can't go backward.

            To gtid_current_pos specifically KN admitted that Change-Master's MASTER_USE_GTID = current_pos confused many people.
            Oth to understand gtid_slave_pos must be comparatively easy.
            What practical purpose do you really need gtid_current_pos for?

            Elkin Andrei Elkin added a comment - - edited gtid_current_pos is updated this way for the reason that it holds 'the most recent' value in the logical time sense. So it remained 40 until the slave server itself (as a "master") generated 41 and then the clock steps. Gtid - current_pos and slave_pos are clocks really, so they can't go backward. To gtid_current_pos specifically KN admitted that Change-Master's MASTER_USE_GTID = current_pos confused many people. Oth to understand gtid_slave_pos must be comparatively easy. What practical purpose do you really need gtid_current_pos for?
            claudio.nanni Claudio Nanni added a comment -

            Andrei,

            > Oth to understand gtid_slave_pos must be comparatively easy.
            >What practical purpose do you really need gtid_current_pos for?

            Yes, gtid_slave_pos would be an easier choice indeed but MaxScale uses gtid_current_pos for it's failover mechanism, maybe there is a valid reason for them to use gtid_slave_pos.

            claudio.nanni Claudio Nanni added a comment - Andrei, > Oth to understand gtid_slave_pos must be comparatively easy. >What practical purpose do you really need gtid_current_pos for? Yes, gtid_slave_pos would be an easier choice indeed but MaxScale uses gtid_current_pos for it's failover mechanism, maybe there is a valid reason for them to use gtid_slave_pos .
            Elkin Andrei Elkin added a comment -

            claudio.nanni, salute.

            I suggest we sum up what the docs say and how that matches our observations.

            1. This report's synopsis states

            gtid_current_pos is not updated until a GTID arrived with gtid_seq_no > max(d), where d is a gtid domain present in gtid_current_pos.

            Such behavior matches the docs when 'recent' is understood in the logical time sense.

            2. Notice that gtid_binlog_state may show changes but only for a reason
            that this array collects GTID per server. Union of its GTID:s relating to any domain in gtid_current_pos can produce a value at most matching or less than
            one in the latter array.

            3. There is unreported observation that despite gtid_strict_mode = ON
            the slave is able to update gtid_binlog_state even though those updates
            having gtid_seq_no < max(d), max(d) = 41 in your example.
            This is benign by gtid_strict_mode definition as the strict mode rules
            apply exclusively to binlogged set of GTID:s. So once they are removed (RESET MASTER) less than 41 local GTID:s also gets accepted (over again).

            All in all I don't see there is anything that sticks out as questionable though
            I admit we may challenge design decisions from perspective of intuitive understanding and easy-doing. For instance, I am unhappy myself to find out
            gtid_strict_mode is actually "gtid_binlog_strict_mode". I really thought it should apply generally, including --skip-log-bin or --log-slave-updates=0 slaves, which it does not.

            Regardless of our dissatisfaction to this part the current issue is
            not a bug in my opinion and should be addressed by extending our documentation
            on three referred items:

            gtid_current_pos - should be stressed on that it's per domain array
            gtid_binlog_state - it's per server-id one
            gtid_strict_mode - makes sense only for binlogging slave

            Feel free to revert the issue status, should you have more feedback.

            Cheers,

            Andrei

            Elkin Andrei Elkin added a comment - claudio.nanni , salute. I suggest we sum up what the docs say and how that matches our observations. 1. This report's synopsis states gtid_current_pos is not updated until a GTID arrived with gtid_seq_no > max(d) , where d is a gtid domain present in gtid_current_pos . Such behavior matches the docs when 'recent' is understood in the logical time sense. 2. Notice that gtid_binlog_state may show changes but only for a reason that this array collects GTID per server. Union of its GTID:s relating to any domain in gtid_current_pos can produce a value at most matching or less than one in the latter array. 3. There is unreported observation that despite gtid_strict_mode = ON the slave is able to update gtid_binlog_state even though those updates having gtid_seq_no < max(d) , max(d) = 41 in your example. This is benign by gtid_strict_mode definition as the strict mode rules apply exclusively to binlogged set of GTID:s. So once they are removed (RESET MASTER) less than 41 local GTID:s also gets accepted (over again). All in all I don't see there is anything that sticks out as questionable though I admit we may challenge design decisions from perspective of intuitive understanding and easy-doing. For instance, I am unhappy myself to find out gtid_strict_mode is actually "gtid_binlog_strict_mode". I really thought it should apply generally, including --skip-log-bin or --log-slave-updates=0 slaves, which it does not. Regardless of our dissatisfaction to this part the current issue is not a bug in my opinion and should be addressed by extending our documentation on three referred items: gtid_current_pos - should be stressed on that it's per domain array gtid_binlog_state - it's per server-id one gtid_strict_mode - makes sense only for binlogging slave Feel free to revert the issue status, should you have more feedback. Cheers, Andrei
            Elkin Andrei Elkin added a comment -

            Reported traces of execution match definitions of the slave GTID state descriptors, as
            extensively commented out. Specifically to slave_current_pos it is updated correctly.

            Elkin Andrei Elkin added a comment - Reported traces of execution match definitions of the slave GTID state descriptors, as extensively commented out. Specifically to slave_current_pos it is updated correctly.

            People

              Elkin Andrei Elkin
              claudio.nanni Claudio Nanni
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.