Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10863

parallel replication tries to continue from wrong position after stopping and restarting slave threads

    XMLWordPrintable

Details

    Description

      When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

      How to reproduce:

      Master config:

      [mysqld]
      server-id=1
      log-bin=master-bin
      binlog-format=statement
      

      Slave config:

      [mysqld]
      server-id=2
      slave_parallel_threads=20
      max_relay_log_size=98304
      

      Create replication user on the master:

      CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
      GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
      RESET MASTER;
      

      Start slave:

      SET GLOBAL gtid_slave_pos='0-1-2';
      CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
      START SLAVE;
      

      On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

      <?php
       
      $domain_id = $argv[1]%3 + 1;
      $table     = $argv[2];
       
      echo "SET gtid_domain_id=$domain_id;";
       
      echo "DROP TABLE IF EXISTS test.$table;";
      echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));"; 
       
      $n=1;
       
      while (true) {
        echo "BEGIN;\n"; 
        for ($i = 1; $i < 1000; $i++) {
          echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
          $n++;
        }
        echo "COMMIT;\n";
      }
      

      Then run several instances of it in parallel:

      for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
      

      Now on the slave run this script to repeatedly stop and restart the SQL thread:

      ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
      sudo mysql <<< 'stop slave sql_thread; show slave status\G'
      gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped"; 
      sudo mysql <<< 'start slave sql_thread;'; sleep 5;
      sss="$(sudo mysql <<< 'show slave status\G')"
      gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp"; 
      if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
      

      This usually fails with a duplicate key error after a small number of iterations already

      Attachments

        1. binlogs.tar.gz
          599 kB
        2. test.log
          14 kB

        Issue Links

          Activity

            People

              knielsen Kristian Nielsen
              hholzgra Hartmut Holzgraefe
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.