[MDEV-7502] Automatic provisioning of slave - Jira

Details

Type: New Feature
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: N/A
Component/s: Replication
Labels:
- foundation
- gsoc15
- gsoc17
- gsoc18
- gsoc19
- gtid
- replication
- sst

Description

Idea

The purpose of this task is to create an easy-to-use facility for setting up a
new MariaDB replication slave.

Setting up a new slave currently involves: 1) installing MariaDB with initial
database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying
initial data from the master to the slave; and 4) starting the slave with
START SLAVE. The idea is to automate step (3), which currently needs to be
done manually.

The syntax could be something as simple as

LOAD DATA FROM MASTER

This would then connect to the master that is currently configured. It will
load a snapshot of all the data on the master, and leave the slave position at
the point of the snapshot, ready for START SLAVE to continue replication from
that point.

Implementation:

The idea is to do this non-blocking on the master, in a way that works for any
storage engine. It will rely on row-based replication to be used between the
master and the slave.

At the start of LOAD DATA FROM MASTER, the slave will enter a special
provisioning mode. It will start replicating events from the master at the
master's current position.

The master dump thread will send binlog events to the slave as normal. But in
addition, it will interleave a dump of all the data on the master contained in
tables, views, or stored functions. Whenever the dump thread would normally go
to sleep waiting for more data to arrive in the binlog, the dump thread will
instead send another chunk of data in the binlog stream for the slave to apply.

A "chunk of data" can be:

A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION

A range of N rows (N=100, for example). Each successive chunk will do a
range scan on the primary key from the end position of the last chunk.

Sending data in small chunks avoids the need for long-lived table locks or
transactions that could adversely affect master performance.

The slave will connect in GTID mode. The master will send dumped chunks in a
separate domain id, allowing the slave to process chunks in parallel with
normal data.

During the provisioning, all normal replication events from the master will
arrive on the slave, and the slave will attempt to apply them locally. Some of
these events will fail to apply, since the affected table or row may not yet
have been loaded. In the provisioning mode, all such errors will be silently
ignored. Proper locking (isolation mode, eg.) must be used on the master when
fetching chunks, to ensure that updates for any row will always be applied
correctly on the slave, either in a chunk, or in a later row event.

In order to make the first version of this feature feasible to implement in a
reasonable amount of time, it should set a number of reasonable restrictions
(which could be relaxed in a later version of the feature):

Give up with an error if the slave is not configured for GTID mode
(MASTER_USE_GTID != NO).

Give up with error if the slave receives any event in statement-based
binlogging (so the master must be running in row-based replication mode,
and no DDL must be done while the provisioning is running).

Give up with an error if the master has a table without primary key.

Secondary indexes will be enabled during the provisioning; this means that
tables with large secondary indexes could be expensive to provision.

Attachments

Issue Links

blocks

MDEV-20106 Restart slave with memory table break replication in strict-mode

Open

relates to

MDEV-8925 Extending protocol to support binary provisioning

Open

MDEV-11675 Lag Free Alter On Slave

Closed

MDEV-14992 BACKUP: in-server backup

Open

MDEV-21106 Port clone plugin from MySQL

Closed

MXS-2542 Add rebuild server to MariaDB Monitor

Closed

MDEV-15610 Add SST for asynchronous slaves

Closed

MDEV-29796 Auto-generated DELETE for MEMORY/HEAP table can break GTID-based replication

Needs Feedback

links to

Meta clone plugin for MyRocks on MySQL

(3 relates to, 1 links to)

Activity

Ascending order - Click to sort in descending order

View 16 older comments

VAROQUI Stephane added a comment - 2019-12-17 13:54

I really don't think it's a matter of week because // replication is faster than a mysqldump restore and can saturate any bandwidth, more over, after 6 years ~~MDEV-11675~~ is still not progressing a variation of this worklog for any non blocking ddl would have cover ~~MDEV-11675~~ in many cases , just by producing a copy table and atomic rename a la pecona toolkit and no one would live with day delayed replications because of DDL

VAROQUI Stephane added a comment - 2019-12-17 13:54 I really don't think it's a matter of week because // replication is faster than a mysqldump restore and can saturate any bandwidth, more over, after 6 years MDEV-11675 is still not progressing a variation of this worklog for any non blocking ddl would have cover MDEV-11675 in many cases , just by producing a copy table and atomic rename a la pecona toolkit and no one would live with day delayed replications because of DDL

VAROQUI Stephane added a comment - 2019-12-17 14:04 - edited

This task well designed can enable any blocking DDL to be done without blocking slave , a table copy can be created injected row in chunk in an other replication domain and rename in the main domain at the end

VAROQUI Stephane added a comment - 2019-12-17 14:04 - edited This task well designed can enable any blocking DDL to be done without blocking slave , a table copy can be created injected row in chunk in an other replication domain and rename in the main domain at the end

Rick James added a comment - 2021-08-25 15:56

I like the feature, but think there are some details to work out.

TRIGGERS will not be sent, correct?
Does the default, hidden, InnoDB PK qualify as a PK in your rule about what to do if a table does not have a PK?
For the rows that fail to apply during provisioning, how will they be applied later? Won't this lead to applying changes out of order? Think about 2 UPDATEs to the same row, but the first is not applied because the row has not arrived yet. Hence the second update is applied first. And what about transactions where some of the rows exist? I think the 'real' replication stream cannot be applied until all the data has arrived.
The provisioning stream will (potentially) be too large to keep in relay logs, hence the logs may need to be purged. Then, non-applied regular replication items should not be interspersed with that stream. Either have a separate logging mechanism, or copy the regular stream over as the provisioning is purged. (OK, the first version may require lots of disk on the Replica.)
How does Galera's pluggable SST address the above issues?

Rick James added a comment - 2021-08-25 15:56 I like the feature, but think there are some details to work out. TRIGGERS will not be sent, correct? Does the default, hidden, InnoDB PK qualify as a PK in your rule about what to do if a table does not have a PK? For the rows that fail to apply during provisioning, how will they be applied later? Won't this lead to applying changes out of order? Think about 2 UPDATEs to the same row, but the first is not applied because the row has not arrived yet. Hence the second update is applied first. And what about transactions where some of the rows exist? I think the 'real' replication stream cannot be applied until all the data has arrived. The provisioning stream will (potentially) be too large to keep in relay logs, hence the logs may need to be purged. Then, non-applied regular replication items should not be interspersed with that stream. Either have a separate logging mechanism, or copy the regular stream over as the provisioning is purged. (OK, the first version may require lots of disk on the Replica.) How does Galera's pluggable SST address the above issues?

Michael Widenius added a comment - 2023-05-19 07:46 - edited

The current solution for this is to use MaxScale 22.08, which can do automatic rebuild of server.
More information at ~~MXS-2542~~.

Michael Widenius added a comment - 2023-05-19 07:46 - edited The current solution for this is to use MaxScale 22.08, which can do automatic rebuild of server . More information at MXS-2542 .

Kristian Nielsen added a comment - 2023-05-20 05:22

It's completely fair to not prioritise this, but why close it? It's a very valuable task, and a feature that anyone would expect to find in a replication system, if it wasn't because we've gotten used to not having it for so many years with mysql/mariadb replication.

Kristian Nielsen added a comment - 2023-05-20 05:22 It's completely fair to not prioritise this, but why close it? It's a very valuable task, and a feature that anyone would expect to find in a replication system, if it wasn't because we've gotten used to not having it for so many years with mysql/mariadb replication.

MariaDB Server

Automatic provisioning of slave