Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33551

Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency

Details

    Description

      Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency

      When using semi-sync replication with rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the primary can significantly reduce to about 10% of AFTER_SYNC's performance for workloads with many concurrent users executing transactions. See the attached graph, where the "Async" bar reports the performance of a primary using asynchronous replication, the "Semi-sync AFTER_COMMIT" and "Semi-sync AFTER_SYNC" reports the performance of the primary using the respective rpl_semi_sync_master_wait_point variable, and "Semi-sync GROUP_ACK" is a prototype of the work outlined by MDEV-33491.

      It can be seen that the performance of the AFTER_COMMIT mode of semi-sync is many times worse than that of the AFTER_SYNC mode. This is because all connections on the primary share the same cond_wait variable/mutex pair, so any time an ACK is recieved from a replica, all waiting connections are awoken to check if the ACK was for itself, which is done in a mutual exclusion zone. See the attached Semi-sync Group Ack Proposal PDF for more details.

      Instead, each connection should use its own condition variable (e.g. THD::COND_wakeup_ready), and the ACK receiver thread should only signal connections which have been ACKed for wakeup.

      Attachments

        Issue Links

          Activity

            The solution Kristian and I come up with is:

            • Let the group-leader thread handle also the AFTER_COMMIT, in which case there is only one thread waiting for the ACK.
            • Instead of requesting an ACK for each commit, only request an ACK for the whole group.
              Why not going with the this solution? It will be much faster and use less bandwidth (as there is only one ACK per group).
            monty Michael Widenius added a comment - The solution Kristian and I come up with is: Let the group-leader thread handle also the AFTER_COMMIT, in which case there is only one thread waiting for the ACK. Instead of requesting an ACK for each commit, only request an ACK for the whole group. Why not going with the this solution? It will be much faster and use less bandwidth (as there is only one ACK per group).
            bnestere Brandon Nesterenko added a comment - - edited

            Hi knielsen!

            This is ready for review: PR 3089.

            The following graph shows the speedup with this patch (the MDEV_33551_AFTER_COMMIT bar):

            bnestere Brandon Nesterenko added a comment - - edited Hi knielsen ! This is ready for review: PR 3089 . The following graph shows the speedup with this patch (the MDEV_33551_AFTER_COMMIT bar):

            monty the semi-sync group ACK work is tracked by MDEV-33491 (with its description updated per our last discussion)

            bnestere Brandon Nesterenko added a comment - monty the semi-sync group ACK work is tracked by MDEV-33491 (with its description updated per our last discussion)

            knielsen, I've addressed your review comments from the mailing list, and force-pushed to PR 3089 with the latest updates. Can you give it another review?

            bnestere Brandon Nesterenko added a comment - knielsen , I've addressed your review comments from the mailing list, and force-pushed to PR 3089 with the latest updates. Can you give it another review?
            bnestere Brandon Nesterenko added a comment - - edited

            Pushed into 10.6 as 75c7c6dc3.

            Merge conflict observed in 10.11 with a fix in branch bb-10.11-MDEV-33551-mergefix.

            Looks like another merge conflict may exist in sql/sql_class.h in 11.X with chunk:

               bool check_slave_ignored_db_with_error(const Lex_ident_db &db) const;
             
              /*
                Indicates if this thread is suspended due to awaiting an ACK from a
                replica. True if suspended, false otherwise.
             
                Note that this variable is protected by Repl_semi_sync_master::LOCK_binlog
              */
              bool is_awaiting_semisync_ack;
            

            This patch removed the is_awaiting_semisync_ack variable, so to resolve, just remove that variable.

            Otherwise no merge issues should be present.

            Final performance figures as follows:

            In summary, this patch changes the way that semi-sync connections wait for ACKs as follows. Previously, when an ACK would arrive from a replica, all user connection threads currently awaiting an ACK for its transaction would be awoken, but each thread could only check if the ACK was for it one at a time. With the patch, instead of waking up all threads awaiting ACKs, only those which are relevant to the received ACK are awoken.

            As an example, say there are 500 transactions on the primary server awaiting an ACK. Previously, when an ACK would come in for the next transaction in the binlog, all 500 threads would be signalled to wake up and check if that ACK was for it, yet each thread could only check one at a time (not in parallel), and only one of the 500 could move on (the other 499 would go back to waiting for their respective ACK). However, with the patch, now only the thread associated with that ACK (or potentially any earlier binlogged transactions) are awoken.

            bnestere Brandon Nesterenko added a comment - - edited Pushed into 10.6 as 75c7c6dc3 . Merge conflict observed in 10.11 with a fix in branch bb-10.11-MDEV-33551-mergefix . Looks like another merge conflict may exist in sql/sql_class.h in 11.X with chunk: bool check_slave_ignored_db_with_error(const Lex_ident_db &db) const;   /* Indicates if this thread is suspended due to awaiting an ACK from a replica. True if suspended, false otherwise.   Note that this variable is protected by Repl_semi_sync_master::LOCK_binlog */ bool is_awaiting_semisync_ack; This patch removed the is_awaiting_semisync_ack variable, so to resolve, just remove that variable. Otherwise no merge issues should be present. Final performance figures as follows: In summary, this patch changes the way that semi-sync connections wait for ACKs as follows. Previously, when an ACK would arrive from a replica, all user connection threads currently awaiting an ACK for its transaction would be awoken, but each thread could only check if the ACK was for it one at a time. With the patch, instead of waking up all threads awaiting ACKs, only those which are relevant to the received ACK are awoken. As an example, say there are 500 transactions on the primary server awaiting an ACK. Previously, when an ACK would come in for the next transaction in the binlog, all 500 threads would be signalled to wake up and check if that ACK was for it, yet each thread could only check one at a time (not in parallel), and only one of the 500 could move on (the other 499 would go back to waiting for their respective ACK). However, with the patch, now only the thread associated with that ACK (or potentially any earlier binlogged transactions) are awoken.

            People

              bnestere Brandon Nesterenko
              bnestere Brandon Nesterenko
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.