Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10863

parallel replication tries to continue from wrong position after stopping and restarting slave threads

Details

    Description

      When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

      How to reproduce:

      Master config:

      [mysqld]
      server-id=1
      log-bin=master-bin
      binlog-format=statement
      

      Slave config:

      [mysqld]
      server-id=2
      slave_parallel_threads=20
      max_relay_log_size=98304
      

      Create replication user on the master:

      CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
      GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
      RESET MASTER;
      

      Start slave:

      SET GLOBAL gtid_slave_pos='0-1-2';
      CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
      START SLAVE;
      

      On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

      <?php
       
      $domain_id = $argv[1]%3 + 1;
      $table     = $argv[2];
       
      echo "SET gtid_domain_id=$domain_id;";
       
      echo "DROP TABLE IF EXISTS test.$table;";
      echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));"; 
       
      $n=1;
       
      while (true) {
        echo "BEGIN;\n"; 
        for ($i = 1; $i < 1000; $i++) {
          echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
          $n++;
        }
        echo "COMMIT;\n";
      }
      

      Then run several instances of it in parallel:

      for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
      

      Now on the slave run this script to repeatedly stop and restart the SQL thread:

      ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
      sudo mysql <<< 'stop slave sql_thread; show slave status\G'
      gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped"; 
      sudo mysql <<< 'start slave sql_thread;'; sleep 5;
      sss="$(sudo mysql <<< 'show slave status\G')"
      gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp"; 
      if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
      

      This usually fails with a duplicate key error after a small number of iterations already

      Attachments

        1. binlogs.tar.gz
          599 kB
        2. test.log
          14 kB

        Issue Links

          Activity

            hholzgra Hartmut Holzgraefe created issue -
            hholzgra Hartmut Holzgraefe made changes -
            Field Original Value New Value
            elenst Elena Stepanova made changes -
            Assignee Lixun Peng [ plinux ]
            elenst Elena Stepanova made changes -
            Fix Version/s 10.0 [ 16000 ]
            hholzgra Hartmut Holzgraefe made changes -
            Description When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

            How to reproduce:

            Master config:

            {noformat}
            [mysqld]
            server-id=1
            log-bin=master-bin
            binlog-format=statement
            {noformat}

            Slave config:

            {noformat}
            [mysqld]
            server-id=2
            slave_parallel_threads = 20;
            max_relaylog_size=98304;
            {noformat}

            Create replication user on the master:

            {noformat}
            CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
            GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
            RESET MASTER;
            {noformat}

            Start slave:

            {noformat}
            SET GLOBAL gtid_slave_pos='0-1-2';
            CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
            START SLAVE;
            {noformat}

            On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

            {noformat}
            <?php

            $domain_id = $argv[1]%3 + 1;
            $table = $argv[2];

            echo "SET gtid_domain_id=$domain_id;";

            echo "DROP TABLE IF EXISTS test.$table;";
            echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

            $n=1;

            while (true) {
              echo "BEGIN;\n";
              for ($i = 1; $i < 1000; $i++) {
                echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
                $n++;
              }
              echo "COMMIT;\n";
            }
            {{noformat}}

            Then run several instances of it in parallel:

            {{noformat}}
            for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
            {{noformat}}

            Now on the slave run this script to repeatedly stop and restart the SQL thread:

            {{noformat}}
            ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
            sudo mysql <<< 'stop slave sql_thread; show slave status\G'
            gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";
            sudo mysql <<< 'start slave sql_thread;'; sleep 5;
            sss="$(sudo mysql <<< 'show slave status\G')"
            gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";
            if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
            {{noformat}}

            This usually fails with a duplicate key error after a small number of iterations already
            When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

            How to reproduce:

            Master config:

            {noformat}
            [mysqld]
            server-id=1
            log-bin=master-bin
            binlog-format=statement
            {noformat}

            Slave config:

            {noformat}
            [mysqld]
            server-id=2
            slave_parallel_threads = 20;
            max_relaylog_size=98304;
            {noformat}

            Create replication user on the master:

            {noformat}
            CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
            GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
            RESET MASTER;
            {noformat}

            Start slave:

            {noformat}
            SET GLOBAL gtid_slave_pos='0-1-2';
            CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
            START SLAVE;
            {noformat}

            On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

            {{noformat}}
            <?php

            $domain_id = $argv[1]%3 + 1;
            $table = $argv[2];

            echo "SET gtid_domain_id=$domain_id;";

            echo "DROP TABLE IF EXISTS test.$table;";
            echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

            $n=1;

            while (true) {
              echo "BEGIN;\n";
              for ($i = 1; $i < 1000; $i++) {
                echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
                $n++;
              }
              echo "COMMIT;\n";
            }
            {{noformat}}

            Then run several instances of it in parallel:

            {{noformat}}
            for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
            {{noformat}}

            Now on the slave run this script to repeatedly stop and restart the SQL thread:

            {{noformat}}
            ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
            sudo mysql <<< 'stop slave sql_thread; show slave status\G'
            gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";
            sudo mysql <<< 'start slave sql_thread;'; sleep 5;
            sss="$(sudo mysql <<< 'show slave status\G')"
            gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";
            if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
            {{noformat}}

            This usually fails with a duplicate key error after a small number of iterations already
            hholzgra Hartmut Holzgraefe made changes -
            Description When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

            How to reproduce:

            Master config:

            {noformat}
            [mysqld]
            server-id=1
            log-bin=master-bin
            binlog-format=statement
            {noformat}

            Slave config:

            {noformat}
            [mysqld]
            server-id=2
            slave_parallel_threads = 20;
            max_relaylog_size=98304;
            {noformat}

            Create replication user on the master:

            {noformat}
            CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
            GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
            RESET MASTER;
            {noformat}

            Start slave:

            {noformat}
            SET GLOBAL gtid_slave_pos='0-1-2';
            CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
            START SLAVE;
            {noformat}

            On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

            {{noformat}}
            <?php

            $domain_id = $argv[1]%3 + 1;
            $table = $argv[2];

            echo "SET gtid_domain_id=$domain_id;";

            echo "DROP TABLE IF EXISTS test.$table;";
            echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

            $n=1;

            while (true) {
              echo "BEGIN;\n";
              for ($i = 1; $i < 1000; $i++) {
                echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
                $n++;
              }
              echo "COMMIT;\n";
            }
            {{noformat}}

            Then run several instances of it in parallel:

            {{noformat}}
            for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
            {{noformat}}

            Now on the slave run this script to repeatedly stop and restart the SQL thread:

            {{noformat}}
            ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
            sudo mysql <<< 'stop slave sql_thread; show slave status\G'
            gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";
            sudo mysql <<< 'start slave sql_thread;'; sleep 5;
            sss="$(sudo mysql <<< 'show slave status\G')"
            gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";
            if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
            {{noformat}}

            This usually fails with a duplicate key error after a small number of iterations already
            When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

            How to reproduce:

            Master config:

            {noformat}
            [mysqld]
            server-id=1
            log-bin=master-bin
            binlog-format=statement
            {noformat}

            Slave config:

            {noformat}
            [mysqld]
            server-id=2
            slave_parallel_threads = 20;
            max_relaylog_size=98304;
            {noformat}

            Create replication user on the master:

            {noformat}
            CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
            GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
            RESET MASTER;
            {noformat}

            Start slave:

            {noformat}
            SET GLOBAL gtid_slave_pos='0-1-2';
            CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
            START SLAVE;
            {noformat}

            On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

            {noformat}
            <?php

            $domain_id = $argv[1]%3 + 1;
            $table = $argv[2];

            echo "SET gtid_domain_id=$domain_id;";

            echo "DROP TABLE IF EXISTS test.$table;";
            echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

            $n=1;

            while (true) {
              echo "BEGIN;\n";
              for ($i = 1; $i < 1000; $i++) {
                echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
                $n++;
              }
              echo "COMMIT;\n";
            }
            {noformat}

            Then run several instances of it in parallel:

            {noformat}
            for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
            {noformat}

            Now on the slave run this script to repeatedly stop and restart the SQL thread:

            {noformat}
            ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
            sudo mysql <<< 'stop slave sql_thread; show slave status\G'
            gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";
            sudo mysql <<< 'start slave sql_thread;'; sleep 5;
            sss="$(sudo mysql <<< 'show slave status\G')"
            gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";
            if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
            {noformat}

            This usually fails with a duplicate key error after a small number of iterations already
            hholzgra Hartmut Holzgraefe made changes -
            Attachment test.log [ 42648 ]
            hholzgra Hartmut Holzgraefe made changes -
            Attachment binlogs.tar.gz [ 42649 ]
            alvinr Alvin Richards (Inactive) made changes -
            Labels bdc
            hholzgra Hartmut Holzgraefe made changes -
            Description When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

            How to reproduce:

            Master config:

            {noformat}
            [mysqld]
            server-id=1
            log-bin=master-bin
            binlog-format=statement
            {noformat}

            Slave config:

            {noformat}
            [mysqld]
            server-id=2
            slave_parallel_threads = 20;
            max_relaylog_size=98304;
            {noformat}

            Create replication user on the master:

            {noformat}
            CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
            GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
            RESET MASTER;
            {noformat}

            Start slave:

            {noformat}
            SET GLOBAL gtid_slave_pos='0-1-2';
            CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
            START SLAVE;
            {noformat}

            On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

            {noformat}
            <?php

            $domain_id = $argv[1]%3 + 1;
            $table = $argv[2];

            echo "SET gtid_domain_id=$domain_id;";

            echo "DROP TABLE IF EXISTS test.$table;";
            echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

            $n=1;

            while (true) {
              echo "BEGIN;\n";
              for ($i = 1; $i < 1000; $i++) {
                echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
                $n++;
              }
              echo "COMMIT;\n";
            }
            {noformat}

            Then run several instances of it in parallel:

            {noformat}
            for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
            {noformat}

            Now on the slave run this script to repeatedly stop and restart the SQL thread:

            {noformat}
            ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
            sudo mysql <<< 'stop slave sql_thread; show slave status\G'
            gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";
            sudo mysql <<< 'start slave sql_thread;'; sleep 5;
            sss="$(sudo mysql <<< 'show slave status\G')"
            gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";
            if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
            {noformat}

            This usually fails with a duplicate key error after a small number of iterations already
            When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

            How to reproduce:

            Master config:

            {noformat}
            [mysqld]
            server-id=1
            log-bin=master-bin
            binlog-format=statement
            {noformat}

            Slave config:

            {noformat}
            [mysqld]
            server-id=2
            slave_parallel_threads=20
            max_relaylog_size=98304
            {noformat}

            Create replication user on the master:

            {noformat}
            CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
            GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
            RESET MASTER;
            {noformat}

            Start slave:

            {noformat}
            SET GLOBAL gtid_slave_pos='0-1-2';
            CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
            START SLAVE;
            {noformat}

            On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

            {noformat}
            <?php

            $domain_id = $argv[1]%3 + 1;
            $table = $argv[2];

            echo "SET gtid_domain_id=$domain_id;";

            echo "DROP TABLE IF EXISTS test.$table;";
            echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

            $n=1;

            while (true) {
              echo "BEGIN;\n";
              for ($i = 1; $i < 1000; $i++) {
                echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
                $n++;
              }
              echo "COMMIT;\n";
            }
            {noformat}

            Then run several instances of it in parallel:

            {noformat}
            for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
            {noformat}

            Now on the slave run this script to repeatedly stop and restart the SQL thread:

            {noformat}
            ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
            sudo mysql <<< 'stop slave sql_thread; show slave status\G'
            gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";
            sudo mysql <<< 'start slave sql_thread;'; sleep 5;
            sss="$(sudo mysql <<< 'show slave status\G')"
            gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";
            if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
            {noformat}

            This usually fails with a duplicate key error after a small number of iterations already
            hholzgra Hartmut Holzgraefe made changes -
            Description When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

            How to reproduce:

            Master config:

            {noformat}
            [mysqld]
            server-id=1
            log-bin=master-bin
            binlog-format=statement
            {noformat}

            Slave config:

            {noformat}
            [mysqld]
            server-id=2
            slave_parallel_threads=20
            max_relaylog_size=98304
            {noformat}

            Create replication user on the master:

            {noformat}
            CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
            GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
            RESET MASTER;
            {noformat}

            Start slave:

            {noformat}
            SET GLOBAL gtid_slave_pos='0-1-2';
            CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
            START SLAVE;
            {noformat}

            On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

            {noformat}
            <?php

            $domain_id = $argv[1]%3 + 1;
            $table = $argv[2];

            echo "SET gtid_domain_id=$domain_id;";

            echo "DROP TABLE IF EXISTS test.$table;";
            echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

            $n=1;

            while (true) {
              echo "BEGIN;\n";
              for ($i = 1; $i < 1000; $i++) {
                echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
                $n++;
              }
              echo "COMMIT;\n";
            }
            {noformat}

            Then run several instances of it in parallel:

            {noformat}
            for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
            {noformat}

            Now on the slave run this script to repeatedly stop and restart the SQL thread:

            {noformat}
            ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
            sudo mysql <<< 'stop slave sql_thread; show slave status\G'
            gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";
            sudo mysql <<< 'start slave sql_thread;'; sleep 5;
            sss="$(sudo mysql <<< 'show slave status\G')"
            gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";
            if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
            {noformat}

            This usually fails with a duplicate key error after a small number of iterations already
            When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

            How to reproduce:

            Master config:

            {noformat}
            [mysqld]
            server-id=1
            log-bin=master-bin
            binlog-format=statement
            {noformat}

            Slave config:

            {noformat}
            [mysqld]
            server-id=2
            slave_parallel_threads=20
            max_relay_log_size=98304
            {noformat}

            Create replication user on the master:

            {noformat}
            CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
            GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
            RESET MASTER;
            {noformat}

            Start slave:

            {noformat}
            SET GLOBAL gtid_slave_pos='0-1-2';
            CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
            START SLAVE;
            {noformat}

            On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

            {noformat}
            <?php

            $domain_id = $argv[1]%3 + 1;
            $table = $argv[2];

            echo "SET gtid_domain_id=$domain_id;";

            echo "DROP TABLE IF EXISTS test.$table;";
            echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));";

            $n=1;

            while (true) {
              echo "BEGIN;\n";
              for ($i = 1; $i < 1000; $i++) {
                echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
                $n++;
              }
              echo "COMMIT;\n";
            }
            {noformat}

            Then run several instances of it in parallel:

            {noformat}
            for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
            {noformat}

            Now on the slave run this script to repeatedly stop and restart the SQL thread:

            {noformat}
            ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
            sudo mysql <<< 'stop slave sql_thread; show slave status\G'
            gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped";
            sudo mysql <<< 'start slave sql_thread;'; sleep 5;
            sss="$(sudo mysql <<< 'show slave status\G')"
            gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp";
            if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
            {noformat}

            This usually fails with a duplicate key error after a small number of iterations already
            knielsen Kristian Nielsen made changes -
            Labels bdc parallelslave replication
            plinux Lixun Peng made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            knielsen Kristian Nielsen made changes -
            Assignee Lixun Peng [ plinux ] Kristian Nielsen [ knielsen ]
            knielsen Kristian Nielsen made changes -
            Fix Version/s 10.0.29 [ 22312 ]
            Fix Version/s 10.1.20 [ 22112 ]
            Fix Version/s 10.0 [ 16000 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 77172 ] MariaDB v4 [ 150947 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 128896

            People

              knielsen Kristian Nielsen
              hholzgra Hartmut Holzgraefe
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.