Details

    Description

      Idea

      The purpose of this task is to create an easy-to-use facility for setting up a
      new MariaDB replication slave.

      Setting up a new slave currently involves: 1) installing MariaDB with initial
      database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying
      initial data from the master to the slave; and 4) starting the slave with
      START SLAVE. The idea is to automate step (3), which currently needs to be
      done manually.

      The syntax could be something as simple as

      LOAD DATA FROM MASTER

      This would then connect to the master that is currently configured. It will
      load a snapshot of all the data on the master, and leave the slave position at
      the point of the snapshot, ready for START SLAVE to continue replication from
      that point.

      Implementation:

      The idea is to do this non-blocking on the master, in a way that works for any
      storage engine. It will rely on row-based replication to be used between the
      master and the slave.

      At the start of LOAD DATA FROM MASTER, the slave will enter a special
      provisioning mode. It will start replicating events from the master at the
      master's current position.

      The master dump thread will send binlog events to the slave as normal. But in
      addition, it will interleave a dump of all the data on the master contained in
      tables, views, or stored functions. Whenever the dump thread would normally go
      to sleep waiting for more data to arrive in the binlog, the dump thread will
      instead send another chunk of data in the binlog stream for the slave to apply.

      A "chunk of data" can be:

      • A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION
      • A range of N rows (N=100, for example). Each successive chunk will do a
        range scan on the primary key from the end position of the last chunk.

      Sending data in small chunks avoids the need for long-lived table locks or
      transactions that could adversely affect master performance.

      The slave will connect in GTID mode. The master will send dumped chunks in a
      separate domain id, allowing the slave to process chunks in parallel with
      normal data.

      During the provisioning, all normal replication events from the master will
      arrive on the slave, and the slave will attempt to apply them locally. Some of
      these events will fail to apply, since the affected table or row may not yet
      have been loaded. In the provisioning mode, all such errors will be silently
      ignored. Proper locking (isolation mode, eg.) must be used on the master when
      fetching chunks, to ensure that updates for any row will always be applied
      correctly on the slave, either in a chunk, or in a later row event.

      In order to make the first version of this feature feasible to implement in a
      reasonable amount of time, it should set a number of reasonable restrictions
      (which could be relaxed in a later version of the feature):

      • Give up with an error if the slave is not configured for GTID mode
        (MASTER_USE_GTID != NO).
      • Give up with error if the slave receives any event in statement-based
        binlogging (so the master must be running in row-based replication mode,
        and no DDL must be done while the provisioning is running).
      • Give up with an error if the master has a table without primary key.
      • Secondary indexes will be enabled during the provisioning; this means that
        tables with large secondary indexes could be expensive to provision.

      Attachments

        Issue Links

          Activity

            knielsen Kristian Nielsen created issue -
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Labels gtid replication gsoc15 gtid replication
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 59330 ] MariaDB v3 [ 65661 ]
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.2 [ 14601 ]
            serg Sergei Golubchik made changes -
            Labels gsoc15 gtid replication gsoc15 gsoc17 gtid replication
            serg Sergei Golubchik made changes -
            Description The purpose of this task is to create an easy-to-use facility for setting up a
            new MariaDB replication slave.

            Setting up a new slave currently involves: 1) installing MariaDB with initial
            database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying
            initial data from the master to the slave; and 4) starting the slave with
            START SLAVE. The idea is to automate step (3), which currently needs to be
            done manually.

            The syntax could be something as simple as

                LOAD DATA FROM MASTER

            This would then connect to the master that is currently configured. It will
            load a snapshot of all the data on the master, and leave the slave position at
            the point of the snapshot, ready for START SLAVE to continue replication from
            that point.

            Implementation:

            The idea is to do this non-blocking on the master, in a way that works for any
            storage engine. It will rely on row-based replication to be used between the
            master and the slave.

            At the start of LOAD DATA FROM MASTER, the slave will enter a special
            provisioning mode. It will start replicating events from the master at the
            master's current position.

            The master dump thread will send binlog events to the slave as normal. But in
            addition, it will interleave a dump of all the data on the master contained in
            tables, views, or stored functions. Whenever the dump thread would normally go
            to sleep waiting for more data to arrive in the binlog, the dump thread will
            instead send another chunk of data in the binlog stream for the slave to apply.

            A "chunk of data" can be:

             - A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION

             - A range of N rows (N=100, for example). Each successive chunk will do a
               range scan on the primary key from the end position of the last chunk.

            Sending data in small chunks avoids the need for long-lived table locks or
            transactions that could adversely affect master performance.

            The slave will connect in GTID mode. The master will send dumped chunks in a
            separate domain id, allowing the slave to process chunks in parallel with
            normal data.

            During the provisioning, all normal replication events from the master will
            arrive on the slave, and the slave will attempt to apply them locally. Some of
            these events will fail to apply, since the affected table or row may not yet
            have been loaded. In the provisioning mode, all such errors will be silently
            ignored. Proper locking (isolation mode, eg.) must be used on the master when
            fetching chunks, to ensure that updates for any row will always be applied
            correctly on the slave, either in a chunk, or in a later row event.

            In order to make the first version of this feature feasible to implement in a
            reasonable amount of time, it should set a number of reasonable restrictions
            (which could be relaxed in a later version of the feature):

             - Give up with an error if the slave is not configured for GTID mode
               (MASTER_USE_GTID != NO).

             - Give up with error if the slave receives any event in statement-based
               binlogging (so the master must be running in row-based replication mode,
               and no DDL must be done while the provisioning is running).

             - Give up with an error if the master has a table without primary key.

             - Secondary indexes will be enabled during the provisioning; this means that
               tables with large secondary indexes could be expensive to provision.
            h2. Idea

            The purpose of this task is to create an easy-to-use facility for setting up a
            new MariaDB replication slave.

            Setting up a new slave currently involves: 1) installing MariaDB with initial
            database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying
            initial data from the master to the slave; and 4) starting the slave with
            START SLAVE. The idea is to automate step (3), which currently needs to be
            done manually.

            The syntax could be something as simple as

                LOAD DATA FROM MASTER

            This would then connect to the master that is currently configured. It will
            load a snapshot of all the data on the master, and leave the slave position at
            the point of the snapshot, ready for START SLAVE to continue replication from
            that point.

            h2. Implementation:

            The idea is to do this non-blocking on the master, in a way that works for any
            storage engine. It will rely on row-based replication to be used between the
            master and the slave.

            At the start of LOAD DATA FROM MASTER, the slave will enter a special
            provisioning mode. It will start replicating events from the master at the
            master's current position.

            The master dump thread will send binlog events to the slave as normal. But in
            addition, it will interleave a dump of all the data on the master contained in
            tables, views, or stored functions. Whenever the dump thread would normally go
            to sleep waiting for more data to arrive in the binlog, the dump thread will
            instead send another chunk of data in the binlog stream for the slave to apply.

            A "chunk of data" can be:

             - A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION

             - A range of N rows (N=100, for example). Each successive chunk will do a
               range scan on the primary key from the end position of the last chunk.

            Sending data in small chunks avoids the need for long-lived table locks or
            transactions that could adversely affect master performance.

            The slave will connect in GTID mode. The master will send dumped chunks in a
            separate domain id, allowing the slave to process chunks in parallel with
            normal data.

            During the provisioning, all normal replication events from the master will
            arrive on the slave, and the slave will attempt to apply them locally. Some of
            these events will fail to apply, since the affected table or row may not yet
            have been loaded. In the provisioning mode, all such errors will be silently
            ignored. Proper locking (isolation mode, eg.) must be used on the master when
            fetching chunks, to ensure that updates for any row will always be applied
            correctly on the slave, either in a chunk, or in a later row event.

            In order to make the first version of this feature feasible to implement in a
            reasonable amount of time, it should set a number of reasonable restrictions
            (which could be relaxed in a later version of the feature):

             - Give up with an error if the slave is not configured for GTID mode
               (MASTER_USE_GTID != NO).

             - Give up with error if the slave receives any event in statement-based
               binlogging (so the master must be running in row-based replication mode,
               and no DDL must be done while the provisioning is running).

             - Give up with an error if the master has a table without primary key.

             - Secondary indexes will be enabled during the provisioning; this means that
               tables with large secondary indexes could be expensive to provision.
            serg Sergei Golubchik made changes -
            Labels gsoc15 gsoc17 gtid replication gsoc15 gsoc17 gsoc18 gtid replication
            danblack Daniel Black made changes -
            ratzpo Rasmus Johansson (Inactive) made changes -
            Labels gsoc15 gsoc17 gsoc18 gtid replication gsoc15 gsoc17 gsoc18 gsoc19 gtid replication
            serg Sergei Golubchik made changes -
            Assignee Andrei Elkin [ elkin ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            stephane@skysql.com VAROQUI Stephane made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            Elkin Andrei Elkin made changes -
            Labels gsoc15 gsoc17 gsoc18 gsoc19 gtid replication gsoc15 gsoc17 gsoc18 gsoc19 gtid replication sst
            stephane@skysql.com VAROQUI Stephane made changes -
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            stephane@skysql.com VAROQUI Stephane made changes -
            stephane@skysql.com VAROQUI Stephane made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.7 [ 24805 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Due Date 2021-09-14
            serg Sergei Golubchik made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Due Date 2021-09-14
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.7 [ 24805 ]
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Assignee Andrei Elkin [ elkin ] Robert Bindar [ robertbindar ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 65661 ] MariaDB v4 [ 130306 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.8 [ 26121 ]
            Elkin Andrei Elkin made changes -
            Assignee Robert Bindar [ robertbindar ] Andrei Elkin [ elkin ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Assignee Andrei Elkin [ elkin ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.9 [ 26905 ]
            markus makela markus makela made changes -
            markus makela markus makela made changes -
            toddstoffel Todd Stoffel (Inactive) made changes -
            toddstoffel Todd Stoffel (Inactive) made changes -
            illuusio Tuukka Pasanen made changes -
            Assignee Tuukka Pasanen [ JIRAUSER49166 ]
            illuusio Tuukka Pasanen made changes -
            Assignee Tuukka Pasanen [ JIRAUSER49166 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Priority Critical [ 2 ] Minor [ 4 ]
            manjot Manjot Singh (Inactive) made changes -
            Fix Version/s N/A [ 14700 ]
            Resolution Won't Fix [ 2 ]
            Status Open [ 1 ] Closed [ 6 ]
            knielsen Kristian Nielsen made changes -
            Assignee Kristian Nielsen [ knielsen ]
            serg Sergei Golubchik made changes -
            Resolution Won't Fix [ 2 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            serg Sergei Golubchik made changes -
            Status Stalled [ 10000 ] Open [ 1 ]
            danblack Daniel Black made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            danblack Daniel Black made changes -
            JIraAutomate JiraAutomate made changes -
            Priority Minor [ 4 ] Major [ 3 ]
            vlad.radu Vlad Radu made changes -
            Labels gsoc15 gsoc17 gsoc18 gsoc19 gtid replication sst foundation gsoc15 gsoc17 gsoc18 gsoc19 gtid replication sst
            ParadoxV5 Jimmy Hú made changes -

            People

              knielsen Kristian Nielsen
              knielsen Kristian Nielsen
              Votes:
              31 Vote for this issue
              Watchers:
              40 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.