Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28609

refine gtid-strict-mode to ignore same server-id gtid from the past on semisync slave

Details

    Description

      To provide semisync master crash-recovery (MDEV-21117) the same server-id transactions
      were made to accept for execution on the semisync slave when the strict gtid
      mode (see MDEV-27760).
      That however caused out-of-order error on a master's transaction
      server of the circular setup.
      The error was fair in the sense of the gtid strict mode rule as indeed
      under the condition of the circular setup the replicated transaction
      already exists in the local binlog.

      This gets fixed to ignore on the gtid strict mode semisync
      slave those gtids that exist in the slave's binlog that effectively restores
      the default same-server-id ignore policy on the semisync slave.
      At the same time the fixes complies with MDEV-21117 semisync slave recovery
      to accept the same server-id transactions that do not exist in local binlog.

      Attachments

        Issue Links

          Activity

            Elkin i have updated this with the respective CS number.

            Faisal Faisal Saeed (Inactive) added a comment - Elkin i have updated this with the respective CS number.
            Elkin Andrei Elkin added a comment -

            Howdy, Brandon!

            Could you please assess a solution you can find in bb-10.6-andrei.
            Initially I though to extend gtid-ignore-duplicates but a nicer idea to extend
            the gtid strict by Sergei has prevailed .

            Cheers,

            Andrei

            Elkin Andrei Elkin added a comment - Howdy, Brandon! Could you please assess a solution you can find in bb-10.6-andrei. Initially I though to extend gtid-ignore-duplicates but a nicer idea to extend the gtid strict by Sergei has prevailed . Cheers, Andrei

            Hi Andrei!

            The patch looks good, and I agree that extending the strict_mode
            generally is more intuitive. I left a few notes on 0b2eeb3.
            for your consideration.

            bnestere Brandon Nesterenko added a comment - Hi Andrei! The patch looks good, and I agree that extending the strict_mode generally is more intuitive. I left a few notes on 0b2eeb3 . for your consideration.
            Elkin Andrei Elkin added a comment - https://github.com/MariaDB/server/pull/2181
            Elkin Andrei Elkin added a comment - - edited

            The test plan for MDEV-28609 refine gtid-strict-mode to ignore same
            server-id gtid from the past on semisync slave
            -------------------------------------------------------------------

            PREAMBLE

            MDEV-21117/MDEV-27760 relaxed own server-id gtid acceptance (was not to accept by
            default) when the gtid strict mode (ON) and the slave is semisync.
            This ensued MDEV-28609 that complains the circular semisync setup
            ceased not work as own gtids that make round-trip hit the gtid
            strict mode Out-Of-(binlog)-Order (OOO) error.

            The MDEV-28609 fixes further appease the gtid strict mode for
            own gtids. For those that are already in the semisync slave's binlog
            there won't be the OOO anymore. It is safe to simply ignore such
            gtids.
            The ignoring still updates the slave gtid state (gtid_slave_pos).

            SUGGESTED TESTING

            takes rpl.rpl_circular_semi_sync as a template.
            Set up two servers circular semisync strict gtid mode configuration
            and execute two the following types of load (L1,L2,L3):

            L1. /* practical */ each server originates gtid with its own
            domain-id. Consider a few (up to 10) domains per server.

            There must be no clashes between different domain-id transactions
            wrt their data too.

            L2. /* lesser practical */ fake server_id as the template does in the
            section B.

            L3. /* "not so practical" */ one common domain and server generates
            transactions in coordination, like in the template test

            ```--connection server_1
            INSERT INTO t1(a) VALUES (1);
            --source include/save_master_gtid.inc

            --connection server_2
            ---sync_with_master
            --source include/sync_with_master_gtid.inc
            INSERT INTO t1(a) VALUES (2);```

            Having the servers under load perform

            PERTURBATIONs

            P1. randomly stop and restart in a random time interval the slave services while keeping up
            the servers' client load
            P2. randomly shutdow-restart / crash-semisync-slave-recover the server and restart its slave service

            to observe eventual consistent server states (gtid slave, binlog and the data).

            Elkin Andrei Elkin added a comment - - edited The test plan for MDEV-28609 refine gtid-strict-mode to ignore same server-id gtid from the past on semisync slave ------------------------------------------------------------------- PREAMBLE MDEV-21117 / MDEV-27760 relaxed own server-id gtid acceptance (was not to accept by default) when the gtid strict mode (ON) and the slave is semisync. This ensued MDEV-28609 that complains the circular semisync setup ceased not work as own gtids that make round-trip hit the gtid strict mode Out-Of-(binlog)-Order (OOO) error. The MDEV-28609 fixes further appease the gtid strict mode for own gtids. For those that are already in the semisync slave's binlog there won't be the OOO anymore. It is safe to simply ignore such gtids. The ignoring still updates the slave gtid state (gtid_slave_pos). SUGGESTED TESTING takes rpl.rpl_circular_semi_sync as a template. Set up two servers circular semisync strict gtid mode configuration and execute two the following types of load (L1,L2,L3): L1. /* practical */ each server originates gtid with its own domain-id. Consider a few (up to 10) domains per server. There must be no clashes between different domain-id transactions wrt their data too. L2. /* lesser practical */ fake server_id as the template does in the section B. L3. /* "not so practical" */ one common domain and server generates transactions in coordination, like in the template test ```--connection server_1 INSERT INTO t1(a) VALUES (1); --source include/save_master_gtid.inc --connection server_2 ---sync_with_master --source include/sync_with_master_gtid.inc INSERT INTO t1(a) VALUES (2);``` Having the servers under load perform PERTURBATIONs P1. randomly stop and restart in a random time interval the slave services while keeping up the servers' client load P2. randomly shutdow-restart / crash-semisync-slave-recover the server and restart its slave service to observe eventual consistent server states (gtid slave, binlog and the data).
            Elkin Andrei Elkin added a comment - - edited

            Faisal, slave salve! Let me paste the fixes' summary from the commit:

            [the issue] is fixed by the commit to ignore on the gtid strict mode semisync
                slave those gtids that exist in the slave's binlog effectively restore
                the default same-server-id ignore policy and
                at the same time the fixes continue to provide
               MDEV-21117/MDEV-27760 semisync slave recovery (that is to accept the same
                server-id transactions that do notexist in local binlog).
            

            Informally, the user only needs `gtid_strict_mode = ON` at least on slave and the semisync slave knows now what its server-id transaction it needs to ignore and what to accept (that's to the MDEV-21117 recovery) w/o any OOO error in the following.

            Elkin Andrei Elkin added a comment - - edited Faisal , slave salve! Let me paste the fixes' summary from the commit: [the issue] is fixed by the commit to ignore on the gtid strict mode semisync slave those gtids that exist in the slave's binlog effectively restore the default same-server-id ignore policy and at the same time the fixes continue to provide MDEV-21117/MDEV-27760 semisync slave recovery (that is to accept the same server-id transactions that do notexist in local binlog). Informally, the user only needs `gtid_strict_mode = ON` at least on slave and the semisync slave knows now what its server-id transaction it needs to ignore and what to accept (that's to the MDEV-21117 recovery) w/o any OOO error in the following.

            Testing of L1, L2, L3 with P1, P2 looked good. Okay to push.

            angelique.sklavounos Angelique Sklavounos (Inactive) added a comment - Testing of L1, L2, L3 with P1, P2 looked good. Okay to push.

            People

              Elkin Andrei Elkin
              Elkin Andrei Elkin
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.