Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4506

MWL#184: Parallel replication of group-committed transactions


    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Fix Version/s: 10.0.5
    • Component/s: None
    • Labels:


      This task is the old idea from here:


      Current state:

      The basic things have been implemented in a working prototype. Initial
      benchmarks are promising:


      The main missing things are to fix up remaining part of the existing
      replication code to correctly handle things like stopping the slave, error
      handling, and relay-log.info updates.


      The idea behind this task is to detect on the master that two or more
      transactions are independent and can be replicated in parallel on the

      This uses binlog group commit. When two transactions are group-committed
      together, they are known to be non-conflicting (otherwise one of them would
      have had to wait for the other to commit first and release its row locks).
      The basic idea is the same as the MySQL 5.7 feature


      The GTID event attached to every event group in the binlog is extended with a
      64-bit number, the group commit id. The id is incremented after each group
      commit. Thus, on the slave, if two transactions have the same group commit id,
      we can execute them in parallel.

      A new flag FL_GROUP_COMMIT_ID is introduced for the GTID event. The group
      commit id is only present if this flag is set (no group commit id is assigned
      when the group commit contains only a single transaction). When the group
      commit id is present, the size of the GTID event is increased by 2 bytes
      (there were already 6 unused bytes present to preserve backwards compatibility
      with MySQL slaves).

      If a slave connects that does not understand the new feature (older MariaDB or
      any Oracle MySQL), backwards compatibility is ensured, since in this case the
      GTID event is replaced transparently with a BEGIN event, as normal.

      Additionally, if two transactions have GTIDs with different replication domain
      ids, this means that the user has marked that they are independent and can be
      executed in parallel on a slave. Domain ids are already implemented in the
      global transaction ID feature, MDEV-26. This task extends it by actually
      executing such transactions in parallel on the slave, if requested by the


      The parallel replication is enabled by setting @@slave_parallel_threads to a
      value of 2 or greater. The variable can be changed dynamically (without
      restarting the server), however all slave threads must be stopped first with
      STOP SLAVE. When parallel replication is enabled, a pool of that many worker
      threads is set up. Each worker thread can potentially execute a binlog event
      in parallel with all other workers, meaning that the value of
      @@slave_parallel_threads gives the maximum potential parallelism of the slave.

      When parallel replication is enabled, the SQL thread no longer executes binlog
      events directly. Instead, each event is queued for one of the worker threads
      to execute. If a transaction is found to be independent of already executing
      transactions, a new worker thread is allocated from the pool, if available,
      and the transaction is queued for parallel execution. Otherwise, the
      transaction is queued for serial executing in an already allocated worker
      thread, and/or it is marked so that the worker thread will wait for all prior
      dependent transactions to complete before starting execution.

      When two transactions with the same replication domain id are replicated in
      parallel (because they share the same commit id), they are still forced to be
      committed in the same order on the slave as on the master. This is important
      for correctness. Without having same commit order, it would be possible for an
      application to see transaction A committed while B was not on one server,
      while on another server B was seen as committed but not A. Preserving commit
      order is also necessary to make promotion of a slave to a new master work;
      this requires that the binlog order is the same on all involved servers.

      When two consecutive transactions A and B are executed in parallel on the
      slave and must preserve commit order, transaction B is marked internally as
      having to wait for A to commit first. If B happens to reach the COMMIT step
      first, it waits for A to commit first. The binlog group commit code is
      extended to take this waiting into account. Thus, when A later gets ready to
      (group) commit, it checks if any transactions are waiting. When it finds B, it
      puts both A itself and B on the group commit list. This ensures that binlog
      group commit is still possible on the slave even though the commits are
      actually serialised, which is an important optimisation when sync_binlog=1.

      When two transactions are replicated in parallel because of different
      replication domain id, there is no requirement to preserve the commit order.
      The different domain ids is an explicit declaration by the user that the
      transactions are independent from the point of view of the application and may
      be re-ordered without violating correctness. Parallel replication of
      transactions in different replication domains is only enabled when using
      global transaction ID (master_use_gtid=slave_pos|current_pos). The current
      GTID position is maintained per domain id, so promotion of slave to master
      works regardless of binlog order.

      Parallel replication can be used both with and without global transaction ID

      Use cases:

      Using the binlog group commit technique, parallel replication can speed up
      replication on the slave automatically, without any other visible changes or
      limitations for applications. The impact of this will however depend on how
      many transactions end up in the same group commit.

      If the commit step on the master is slow (eg. --sync-binlog=1 and
      --innodb-flush-logs-at-trx-commit=1 with slow disk system) and number of
      transactions per second is high, then there will generally be many
      transactions in each group commit, and high opportunity for parallelism on the
      slaves. The average number of transactions per binlog group commit can be
      found by dividing binlog_commits with binlog_group_commits (from SHOW STATUS).

      But if there are fewer but larger transactions, and/or the commit step is fast
      (battery-backed up RAID controller), then there may be too little opportunity
      for parallelism on the slave. To get more transactions included in each group
      commit, the variables @@binlog_commit_wait_count and @@binlog_commit_wait_usec
      can be set. When set, the commit step will wait at most
      @@binlog_commit_wait_usec microseconds for at least @@binlog_commit_wait_count
      transactions to group commit together. This may also decrease disk pressure on
      the master (by reducing number of syncs to disk during commit). However, if
      there is not enough parallelism in the load on the master, setting these
      variables too high may significantly decrease throughput on the master, as the
      database sits idle waiting for commits to arrive that never do.

      When using parallel replication together with global transaction ID, more
      opportunities for parallelism on the slave can be explicitly declared by the
      user/application by setting the @@gtid_domain_id. When the domain id is set
      different for two transactions, this is an explicit statement that those two
      transactions can not conflict, and that the application can tolerate them
      being committed in any order on a slave. This can then be used on slaves for
      parallel replication, if enabled with @@slave_parallel_threads.

      One use case is for long-running queries like a big ALTER TABLE or something
      like that:

          SET @@gtid_domain_id=1;
          ALTER TABLE my_big_table ...

      Normally, such a long-running ALTER TABLE would block subsequent events from
      replicating, causing significant delay in replication. By assigning that
      statement to a different replication domain, other events (in default domain
      0) can then freely replicate in parallel. It is then the responsibility of the
      user to ensure that the ALTER statement has time to complete on all slaves
      before any queries are done on the master that depends on the new table.

      Another use case is to allow two independent applications to replicate
      independently of each other. For example, they might use each their own schema
      and never touch the schema of the other.

      In such cases, the user can set @@gtid_domain_id=1 in one application and
      @@gtid_domain_id=2 in the other, and the transactions from the two
      applications will then be able to replicate in parallel on slaves. This is
      similar to the multi-threaded slave feature of MySQL 5.6; however note that it
      is not limited to using different schema. Any transactions can be marked for
      independent parallel replication, even against the same table, as long as the
      user/application ensures that the transactions are in fact independent.

      Changing @@gtid_domain_id currently requires SUPER privilege, which limits
      this technique to some degree. The reason for the SUPER requirement is that
      unlimited change of @@gtid_domain_id can easily break replication.


          Issue Links



              • Assignee:
                knielsen Kristian Nielsen
                knielsen Kristian Nielsen
              • Votes:
                0 Vote for this issue
                9 Start watching this issue


                • Created: