Details

    • Task
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      patch has been ported to 10.1
      you can use it under new (also called "3-clause") BSD license
      i have done internal benchmarks...but you are most welcome to do some too.

      DESCRIPTION:
      no slave left behind

      this patch implements master throttling based on slave lag,
      aka no slave left behind. the core feature works as follows
      1) the semi-sync-reply is ammended to also report back SQL-thread
      position (aka exec position)
      2) transactions are not removed from the "active-transaction-list"
      in the semi-sync-master plugin until atleast one slave has reported
      that it has executed this transaction. the slave lag can then
      be estimated by calculating how long the oldest transaction has been
      lingering in the active-transaction-list.
      3) client-threads are forced to wait before commit until slave lag
      has decreased to acceptable value.

      the following variables are introduced on master:

      • rpl_semi_sync_master_max_slave_lag (global)
      • rpl_semi_sync_master_slave_lag_wait_timeout (session)

      the following status variables are introduced on master:

      • rpl_semi_sync_master_slave_lag_wait_sessions
      • rpl_semi_sync_master_estimated_slave_lag
      • rpl_semi_sync_master_trx_slave_lag_wait_time
      • rpl_semi_sync_master_trx_slave_lag_wait_num
      • rpl_semi_sync_master_avg_trx_slave_lag_wait_time

      the following variables are introduced on slave:

      • rpl_semi_sync_slave_lag_enabled (global)

      in addition to this, 2 optimizations that decreases overhead of semi-sync
      is introduced.
      1) the idea of this is that if when a slave should send and transaction,
      it checks if it should be semi-synced, but rather
      than semi-sync:ing each transaction (which is done currently) the code
      will skip semi-syncing transaction if there already is newer transactions
      committed. But, since this can mean that semi-syncing is delayed indefinitely
      a cap is set using 2 new master variables:

      • rpl_semi_sync_master_max_unacked_event_bytes (global)
      • rpl_semi_sync_master_max_unacked_event_count (global)
        2) rpl_semi_sync_master_group_commit which makes the semi-sync
        plugin only semi-sync the last transaction in a group commit.

      Attachments

        Issue Links

          Activity

            jonaso Jonas Oreland added a comment -

            a comment is that I have not tested/considered parallel slave applier.
            but if the get_master_log_pos-function that I wrote works with parallel slave applier,
            it should work. feedback welcome

            jonaso Jonas Oreland added a comment - a comment is that I have not tested/considered parallel slave applier. but if the get_master_log_pos-function that I wrote works with parallel slave applier, it should work. feedback welcome

            > a comment is that I have not tested/considered parallel slave applier. but
            > if the get_master_log_pos-function that I wrote works with parallel slave
            > applier, it should work.

            I think it should work. The rli->group_master_log_name and
            rli->group_master_log_pos fields are also updated in parallel replication.

            The update happens out-of-order though. Especially when using multiple
            replications domains and GTID, one domain can be quite a bit ahead of
            another. So the "no slave left behind" will use the position of the
            most-ahead worker thread to tell how far the slave has progressed, not the
            position of the most-behind worker. That seems fine, I think.

            knielsen Kristian Nielsen added a comment - > a comment is that I have not tested/considered parallel slave applier. but > if the get_master_log_pos-function that I wrote works with parallel slave > applier, it should work. I think it should work. The rli->group_master_log_name and rli->group_master_log_pos fields are also updated in parallel replication. The update happens out-of-order though. Especially when using multiple replications domains and GTID, one domain can be quite a bit ahead of another. So the "no slave left behind" will use the position of the most-ahead worker thread to tell how far the slave has progressed, not the position of the most-behind worker. That seems fine, I think.
            knielsen Kristian Nielsen added a comment - Review sent on maria-developers@: https://lists.launchpad.net/maria-developers/msg08575.html
            Elkin Andrei Elkin added a comment -

            julien.fritsch The status is correct. The best time to review it would be when we'll be back to semisync context.

            Elkin Andrei Elkin added a comment - julien.fritsch The status is correct. The best time to review it would be when we'll be back to semisync context.

            The patch relies heavily on legacy semi-sync code. Moving to stalled as the rebasing would require significant work. Also considering MDEV-19140, and others (linked to MDEV-19140), the design of this patch should be re-considered in the scope of the grand goal.

            bnestere Brandon Nesterenko added a comment - The patch relies heavily on legacy semi-sync code. Moving to stalled as the rebasing would require significant work. Also considering MDEV-19140 , and others (linked to MDEV-19140 ), the design of this patch should be re-considered in the scope of the grand goal.

            People

              bnestere Brandon Nesterenko
              jonaso Jonas Oreland
              Votes:
              6 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.