[MDEV-4506] MWL#184: Parallel replication of group-committed transactions - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.0.5
Component/s: None
Labels:
None

Description

This task is the old idea from here:

http://askmonty.org/worklog/Server-RawIdeaBin/?tid=184

Current state:

The basic things have been implemented in a working prototype. Initial
benchmarks are promising:

https://lists.launchpad.net/maria-developers/msg05869.html

The main missing things are to fix up remaining part of the existing
replication code to correctly handle things like stopping the slave, error
handling, and relay-log.info updates.

Design:

The idea behind this task is to detect on the master that two or more
transactions are independent and can be replicated in parallel on the
slave.

This uses binlog group commit. When two transactions are group-committed
together, they are known to be non-conflicting (otherwise one of them would
have had to wait for the other to commit first and release its row locks).
The basic idea is the same as the MySQL 5.7 feature
slave-parallel-type=LOGICAL_CLOCK:

http://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#sysvar_slave_parallel_type

The GTID event attached to every event group in the binlog is extended with a
64-bit number, the group commit id. The id is incremented after each group
commit. Thus, on the slave, if two transactions have the same group commit id,
we can execute them in parallel.

A new flag FL_GROUP_COMMIT_ID is introduced for the GTID event. The group
commit id is only present if this flag is set (no group commit id is assigned
when the group commit contains only a single transaction). When the group
commit id is present, the size of the GTID event is increased by 2 bytes
(there were already 6 unused bytes present to preserve backwards compatibility
with MySQL slaves).

If a slave connects that does not understand the new feature (older MariaDB or
any Oracle MySQL), backwards compatibility is ensured, since in this case the
GTID event is replaced transparently with a BEGIN event, as normal.

Additionally, if two transactions have GTIDs with different replication domain
ids, this means that the user has marked that they are independent and can be
executed in parallel on a slave. Domain ids are already implemented in the
global transaction ID feature, ~~MDEV-26~~. This task extends it by actually
executing such transactions in parallel on the slave, if requested by the
user.

Implementation:

The parallel replication is enabled by setting @@slave_parallel_threads to a
value of 2 or greater. The variable can be changed dynamically (without
restarting the server), however all slave threads must be stopped first with
STOP SLAVE. When parallel replication is enabled, a pool of that many worker
threads is set up. Each worker thread can potentially execute a binlog event
in parallel with all other workers, meaning that the value of
@@slave_parallel_threads gives the maximum potential parallelism of the slave.

When parallel replication is enabled, the SQL thread no longer executes binlog
events directly. Instead, each event is queued for one of the worker threads
to execute. If a transaction is found to be independent of already executing
transactions, a new worker thread is allocated from the pool, if available,
and the transaction is queued for parallel execution. Otherwise, the
transaction is queued for serial executing in an already allocated worker
thread, and/or it is marked so that the worker thread will wait for all prior
dependent transactions to complete before starting execution.

When two transactions with the same replication domain id are replicated in
parallel (because they share the same commit id), they are still forced to be
committed in the same order on the slave as on the master. This is important
for correctness. Without having same commit order, it would be possible for an
application to see transaction A committed while B was not on one server,
while on another server B was seen as committed but not A. Preserving commit
order is also necessary to make promotion of a slave to a new master work;
this requires that the binlog order is the same on all involved servers.

When two consecutive transactions A and B are executed in parallel on the
slave and must preserve commit order, transaction B is marked internally as
having to wait for A to commit first. If B happens to reach the COMMIT step
first, it waits for A to commit first. The binlog group commit code is
extended to take this waiting into account. Thus, when A later gets ready to
(group) commit, it checks if any transactions are waiting. When it finds B, it
puts both A itself and B on the group commit list. This ensures that binlog
group commit is still possible on the slave even though the commits are
actually serialised, which is an important optimisation when sync_binlog=1.

When two transactions are replicated in parallel because of different
replication domain id, there is no requirement to preserve the commit order.
The different domain ids is an explicit declaration by the user that the
transactions are independent from the point of view of the application and may
be re-ordered without violating correctness. Parallel replication of
transactions in different replication domains is only enabled when using
global transaction ID (master_use_gtid=slave_pos|current_pos). The current
GTID position is maintained per domain id, so promotion of slave to master
works regardless of binlog order.

Parallel replication can be used both with and without global transaction ID
enabled.

Use cases:

Using the binlog group commit technique, parallel replication can speed up
replication on the slave automatically, without any other visible changes or
limitations for applications. The impact of this will however depend on how
many transactions end up in the same group commit.

If the commit step on the master is slow (eg. --sync-binlog=1 and
--innodb-flush-logs-at-trx-commit=1 with slow disk system) and number of
transactions per second is high, then there will generally be many
transactions in each group commit, and high opportunity for parallelism on the
slaves. The average number of transactions per binlog group commit can be
found by dividing binlog_commits with binlog_group_commits (from SHOW STATUS).

But if there are fewer but larger transactions, and/or the commit step is fast
(battery-backed up RAID controller), then there may be too little opportunity
for parallelism on the slave. To get more transactions included in each group
commit, the variables @@binlog_commit_wait_count and @@binlog_commit_wait_usec
can be set. When set, the commit step will wait at most
@@binlog_commit_wait_usec microseconds for at least @@binlog_commit_wait_count
transactions to group commit together. This may also decrease disk pressure on
the master (by reducing number of syncs to disk during commit). However, if
there is not enough parallelism in the load on the master, setting these
variables too high may significantly decrease throughput on the master, as the
database sits idle waiting for commits to arrive that never do.

When using parallel replication together with global transaction ID, more
opportunities for parallelism on the slave can be explicitly declared by the
user/application by setting the @@gtid_domain_id. When the domain id is set
different for two transactions, this is an explicit statement that those two
transactions can not conflict, and that the application can tolerate them
being committed in any order on a slave. This can then be used on slaves for
parallel replication, if enabled with @@slave_parallel_threads.

One use case is for long-running queries like a big ALTER TABLE or something
like that:

    SET @@gtid_domain_id=1;

    ALTER TABLE my_big_table ...

Normally, such a long-running ALTER TABLE would block subsequent events from
replicating, causing significant delay in replication. By assigning that
statement to a different replication domain, other events (in default domain
0) can then freely replicate in parallel. It is then the responsibility of the
user to ensure that the ALTER statement has time to complete on all slaves
before any queries are done on the master that depends on the new table.

Another use case is to allow two independent applications to replicate
independently of each other. For example, they might use each their own schema
and never touch the schema of the other.

In such cases, the user can set @@gtid_domain_id=1 in one application and
@@gtid_domain_id=2 in the other, and the transactions from the two
applications will then be able to replicate in parallel on slaves. This is
similar to the multi-threaded slave feature of MySQL 5.6; however note that it
is not limited to using different schema. Any transactions can be marked for
independent parallel replication, even against the same table, as long as the
user/application ensures that the transactions are in fact independent.

Changing @@gtid_domain_id currently requires SUPER privilege, which limits
this technique to some degree. The reason for the SUPER requirement is that
unlimited change of @@gtid_domain_id can easily break replication.

Attachments

Issue Links

relates to

MDEV-5184 Valgrind warnings (Conditional jump or move depends on uninitialised value) on slave with slave-parallel-threads > 0

Closed

MDEV-5189 Replication failure and subsequent assertion failure on concurrent DML flow with slave-parallel-threads > 1 with gtid_domain_id per thread

Closed

MDEV-5195 Slave crashes in rpt_handle_event on executing check_warnings and FLUSH LOGS

Closed

MDEV-5196 Server hangs or assertion `!thd->wait_for_commit_ptr' fails on MASTER_POS_WAIT with slave-parallel-threads > 0

Closed

MDEV-5206 Slave with parallel threads is overly optimistic about its master log position after FLUSH LOGS

Closed

MDEV-5207 Assertion `pos_in_file == info->end_of_file' fails on restart of a slave with parallel threads after master switches on binlog_checksum

Closed

MDEV-5217 Assorted failures with parallel replication on concurrent DML/DDL flow and periodic FLUSH LOGS

Closed

MDEV-5219 Slave status does not always show SQL Error/Errno for slave with parallel threads

Closed

MDEV-5262 Missing retry after temp error in parallel replication

Closed

MDEV-5291 Slave transaction retry on temporary error leaves dangling error in SHOW SLAVE STATUS

Closed

MDEV-5296 No status information available for parallel replication

Open

MDEV-5363 Make parallel replication waits killable

Closed

(7 relates to)

Activity

Ascending order - Click to sort in descending order

Jeremy Cole added a comment - 2013-09-27 21:12

Kristian, can you comment on whether we should expect any problems combining parallel replication with GTID strict mode?

Jeremy Cole added a comment - 2013-09-27 21:12 Kristian, can you comment on whether we should expect any problems combining parallel replication with GTID strict mode?

Kristian Nielsen added a comment - 2013-09-27 21:21

> Kristian, can you comment on whether we should expect any problems combining
> parallel replication with GTID strict mode?

On the contrary, using GTID, and GTID strict mode, is the recommended way to
work with parallel replication.

Kristian Nielsen added a comment - 2013-09-27 21:21 > Kristian, can you comment on whether we should expect any problems combining > parallel replication with GTID strict mode? On the contrary, using GTID, and GTID strict mode, is the recommended way to work with parallel replication.

Kristian Nielsen added a comment - 2013-11-07 10:32

The code is pushed into 10.0.5 (though it's far from complete)

Kristian Nielsen added a comment - 2013-11-07 10:32 The code is pushed into 10.0.5 (though it's far from complete)

MariaDB Server

MWL#184: Parallel replication of group-committed transactions

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration