Details

    Description

      When using the semi-sync protocol, track each replica's progress. The SHOW REPLICA HOSTS output should be extended with two fields: one representing the GTID state last sent to the replica, and the other representing the GTID state last ACK'd from the replica.

      Additionally, treat rpl_semi_sync_master_timeout=0 as a special case where transactions do not await an ACK, but still report progress from the ack_thread on replica reply.

      When running SHOW REPLICA HOSTS instead of 4 columns

      SHOW REPLICA HOSTS;
      Server_id	Host	Port	Master_id
      

      we want to have additional columns

      Server_id  Host       Port    Master_id   Gtid_State_Sent   Gtid_State_Ack
      3          127.0.0.1  16002   1           0-1-5,1-2-10      0-1-4,1-2-7
      4          127.0.0.1  16003   1           0-1-5,1-2-10      0-1-2,1-2-3
      

      Additional notes:
      1. Replicas with semi-sync disabled should have empty values for Gtid_State_Ack.
      2. Updating a replica's Gtid_State_Ack should be done by the ack thread after validating that the transaction was sent to the replica
      3. The new columns should always be present, regardless of rpl_semi_sync_master_enabled, as the primary can disable semi-sync dynamically, and it could still be useful to display the information.
      4. Per design, when rpl_semi_sync_master_timeout=0, than rpl_semi_sync_master_status, if enabled earlier, should stay ON , instead of switching off to async and incrementing the counter updates.
      5. Per desing, in order to distinguish replica type between async, semi-sync stalled and semi-sync active replica, suggestion is to add new column Replica_type with replica type values depending on the state of replica.

      Attachments

        Issue Links

          Activity

            Hi knielsen and Elkin!

            I've put some final touches to the previously started patch, and it is ready for one of your reviews: PR-3288 (note this is a new PR that I've opened).

            bnestere Brandon Nesterenko added a comment - Hi knielsen and Elkin ! I've put some final touches to the previously started patch, and it is ready for one of your reviews: PR-3288 (note this is a new PR that I've opened).

            Hi Roel!

            This is ready to be tested in branch preview-11.6-MDEV-21322.

            bnestere Brandon Nesterenko added a comment - Hi Roel ! This is ready to be tested in branch preview-11.6-MDEV-21322 .
            Roel Roel Van de Paar added a comment - - edited https://github.com/MariaDB/server/commit/11d7bd2ed3322279e14068bdf8a913626900f4ac https://github.com/MariaDB/server/pull/1427 > https://github.com/MariaDB/server/pull/2374 > https://github.com/MariaDB/server/pull/3288 https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/62YVCBGSC23PDTPHSEBU4LH74LZVGJD7/

            Pulling from 11.6, as there is some unnecessary overhead in the implementation, and we can instead re-consider a better design (or decide if we even want this feature at all).

            bnestere Brandon Nesterenko added a comment - Pulling from 11.6, as there is some unnecessary overhead in the implementation, and we can instead re-consider a better design (or decide if we even want this feature at all).
            JackSlater Jack added a comment -

            Amazing

            5 years later, the patchset passed through a myriad of hands, and now we think that, maybe, the feature is not required.

            In the mean time, every body with a master/slave setup must keep a large dataset of binlog, most of which are useless.
            Because there is no way to know when a binlog has been applied to the client.
            So we must keep something like a day or two, and bet that the apply will be done (even if the slave went into maintenance somehow).
            If the bet is right, then we only consume lots of storage, for nothing
            If the bet is wrong, then replication fails, the slave is no longer usable, and we must recreate it from scratch (which is pain)

            Postgres has its own issues, but damn, replication-stuff is so much easier and cheaper there

            JackSlater Jack added a comment - Amazing 5 years later, the patchset passed through a myriad of hands, and now we think that, maybe, the feature is not required. In the mean time, every body with a master/slave setup must keep a large dataset of binlog, most of which are useless. Because there is no way to know when a binlog has been applied to the client. So we must keep something like a day or two, and bet that the apply will be done (even if the slave went into maintenance somehow). If the bet is right, then we only consume lots of storage, for nothing If the bet is wrong, then replication fails, the slave is no longer usable, and we must recreate it from scratch (which is pain) Postgres has its own issues, but damn, replication-stuff is so much easier and cheaper there

            People

              bnestere Brandon Nesterenko
              anel Anel Husakovic
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.