Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10863

parallel replication tries to continue from wrong position after stopping and restarting slave threads

Details

    Description

      When doing STOP SLAVE SQL_THREAD and START SLAVE SQL_THREAD repeatedly parallel replication sometimes fails on START with a duplicate key error. SHOW SLAVE STATUS then always shows a relay log position at the beginning of a log file, e.g. "Relay_Log_Pos: 4". This only seems to happen if a large transaction is split across more than one relay log file.

      How to reproduce:

      Master config:

      [mysqld]
      server-id=1
      log-bin=master-bin
      binlog-format=statement
      

      Slave config:

      [mysqld]
      server-id=2
      slave_parallel_threads=20
      max_relay_log_size=98304
      

      Create replication user on the master:

      CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
      GRANT ALL PRIVILEGES ON *.* TO 'repl'@'%';
      RESET MASTER;
      

      Start slave:

      SET GLOBAL gtid_slave_pos='0-1-2';
      CHANGE MASTER TO Master_host='mymasterip', Master_user='repl', Master_password='password', master_use_gtid=slave_pos;
      START SLAVE;
      

      On the master: create a simple PHP script "test.php" to populate a table using given domain ID (modulo 3) and table name in transactions of 1000 rows each (transactions will take about 40K binlog space each, so talking about half of the max_relaylog_size which makes hitting a transaction split very likely)

      <?php
       
      $domain_id = $argv[1]%3 + 1;
      $table     = $argv[2];
       
      echo "SET gtid_domain_id=$domain_id;";
       
      echo "DROP TABLE IF EXISTS test.$table;";
      echo "CREATE TABLE test.$table(id int primary key, msg varchar(100));"; 
       
      $n=1;
       
      while (true) {
        echo "BEGIN;\n"; 
        for ($i = 1; $i < 1000; $i++) {
          echo "INSERT INTO test.$table VALUES($n, MD5(RAND()));\n";
          $n++;
        }
        echo "COMMIT;\n";
      }
      

      Then run several instances of it in parallel:

      for a in $(seq 20); do (php test.php $a t$a | mysql &) ; done
      

      Now on the slave run this script to repeatedly stop and restart the SQL thread:

      ( sudo mysql <<< "select version(); start slave sql_thread"; while date; do
      sudo mysql <<< 'stop slave sql_thread; show slave status\G'
      gsp_stopped="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos;')"; echo "$gsp_stopped"; 
      sudo mysql <<< 'start slave sql_thread;'; sleep 5;
      sss="$(sudo mysql <<< 'show slave status\G')"
      gsp="$(sudo mysql <<< 'SELECT @@GLOBAL.gtid_slave_pos')"; echo "$sss"; echo "$gsp"; 
      if test "$(awk '$1 == "Slave_SQL_Running:"{print $2}' <<< "$sss")" == "Yes"; then echo; echo "sleeping..."; sleep 5; else echo "things are broken..."; break; fi; done; ) | tee test.log
      

      This usually fails with a duplicate key error after a small number of iterations already

      Attachments

        1. binlogs.tar.gz
          599 kB
        2. test.log
          14 kB

        Issue Links

          Activity

            We assumed that this might already have been fixed with

            http://lists.askmonty.org/pipermail/commits/2016-January/008828.html

            but it turns out that it isn't ...

            hholzgra Hartmut Holzgraefe added a comment - We assumed that this might already have been fixed with http://lists.askmonty.org/pipermail/commits/2016-January/008828.html but it turns out that it isn't ...

            Full gtid_% variables on slave after running into duplicate key error with attached logs:

            MariaDB [test]> show variables like "gt%";
            +------------------------+-------------------------------+
            | Variable_name          | Value                         |
            +------------------------+-------------------------------+
            | gtid_binlog_pos        | 0-2-2116                      |
            | gtid_binlog_state      | 0-2-2116                      |
            | gtid_current_pos       | 1-1-10,0-2-2116,3-1-10,2-1-22 |
            | gtid_domain_id         | 0                             |
            | gtid_ignore_duplicates | OFF                           |
            | gtid_seq_no            | 0                             |
            | gtid_slave_pos         | 1-1-10,0-1-2,3-1-10,2-1-22    |
            | gtid_strict_mode       | OFF                           |
            +------------------------+-------------------------------+
            8 rows in set (0.00 sec)
            

            Already replicated tables and their sizes:

            MariaDB [test]> show tables;
            +----------------+
            | Tables_in_test |
            +----------------+
            | t1             |
            | t13            |
            | t16            |
            | t2             |
            | t3             |
            | t4             |
            | t7             |
            | t8             |
            +----------------+
            8 rows in set (0.00 sec)
             
            +----------+---------+
            | count(*) | max(id) |
            +----------+---------+
            |     1998 |    1998 |
            +----------+---------+
             
            table t2
            +----------+---------+
            | count(*) | max(id) |
            +----------+---------+
            |      999 |     999 |
            +----------+---------+
             
            table t3
            +----------+---------+
            | count(*) | max(id) |
            +----------+---------+
            |     2997 |    2997 |
            +----------+---------+
             
            table t4
            +----------+---------+
            | count(*) | max(id) |
            +----------+---------+
            |     4995 |    4995 |
            +----------+---------+
             
            table t7
            +----------+---------+
            | count(*) | max(id) |
            +----------+---------+
            |     2997 |    2997 |
            +----------+---------+
             
            table t8
            +----------+---------+
            | count(*) | max(id) |
            +----------+---------+
            |        0 |    NULL |
            +----------+---------+
             
            table t13
            +----------+---------+
            | count(*) | max(id) |
            +----------+---------+
            |        0 |    NULL |
            +----------+---------+
             
            table t16
            +----------+---------+
            | count(*) | max(id) |
            +----------+---------+
            |        0 |    NULL |
            +----------+---------+
             
            
            

            hholzgra Hartmut Holzgraefe added a comment - Full gtid_% variables on slave after running into duplicate key error with attached logs: MariaDB [test]> show variables like "gt%"; +------------------------+-------------------------------+ | Variable_name | Value | +------------------------+-------------------------------+ | gtid_binlog_pos | 0-2-2116 | | gtid_binlog_state | 0-2-2116 | | gtid_current_pos | 1-1-10,0-2-2116,3-1-10,2-1-22 | | gtid_domain_id | 0 | | gtid_ignore_duplicates | OFF | | gtid_seq_no | 0 | | gtid_slave_pos | 1-1-10,0-1-2,3-1-10,2-1-22 | | gtid_strict_mode | OFF | +------------------------+-------------------------------+ 8 rows in set (0.00 sec) Already replicated tables and their sizes: MariaDB [test]> show tables; +----------------+ | Tables_in_test | +----------------+ | t1 | | t13 | | t16 | | t2 | | t3 | | t4 | | t7 | | t8 | +----------------+ 8 rows in set (0.00 sec)   +----------+---------+ | count(*) | max(id) | +----------+---------+ | 1998 | 1998 | +----------+---------+   table t2 +----------+---------+ | count(*) | max(id) | +----------+---------+ | 999 | 999 | +----------+---------+   table t3 +----------+---------+ | count(*) | max(id) | +----------+---------+ | 2997 | 2997 | +----------+---------+   table t4 +----------+---------+ | count(*) | max(id) | +----------+---------+ | 4995 | 4995 | +----------+---------+   table t7 +----------+---------+ | count(*) | max(id) | +----------+---------+ | 2997 | 2997 | +----------+---------+   table t8 +----------+---------+ | count(*) | max(id) | +----------+---------+ | 0 | NULL | +----------+---------+   table t13 +----------+---------+ | count(*) | max(id) | +----------+---------+ | 0 | NULL | +----------+---------+   table t16 +----------+---------+ | count(*) | max(id) | +----------+---------+ | 0 | NULL | +----------+---------+  

            The correct position to continue from in hartmut-3510-relay-bin.000038 would have been 81695:

            # at 81695
            #160922  0:19:20 server id 1  end_log_pos 277188        GTID 2-1-23 cid=15432534
            /*!100001 SET @@session.gtid_domain_id=2*//*!*/;
            /*!100001 SET @@session.server_id=1*//*!*/;
            /*!100001 SET @@session.gtid_seq_no=23*//*!*/;
            BEGIN
            

            which matches the information for domain 2 in gtid_current_pos / gtid_slave_pos which shows GTID 2-1-22 as last applied for this domain.

            I don't fully understand the code behind this yet, but my blind educated guess is that the last applied transaction GTID 2-1-22 is actually being searched for first, and only then 2-1-23, and as the beginning event of transaction 2-1-22 was in the previous, already purged, binlog hartmut-3510-relay-bin.000037, it can't be found in the current binlog and so the position is never set correctly, instead just remains at the beginning of .000038, and replication wrongfully continues from there. (Or at least mysqlbinlog doesn't show any trace of GTID 2-1-22, not sure whether its output is complete in that respect though)

            hholzgra Hartmut Holzgraefe added a comment - The correct position to continue from in hartmut-3510-relay-bin.000038 would have been 81695: # at 81695 #160922 0:19:20 server id 1 end_log_pos 277188 GTID 2-1-23 cid=15432534 /*!100001 SET @@session.gtid_domain_id=2*//*!*/; /*!100001 SET @@session.server_id=1*//*!*/; /*!100001 SET @@session.gtid_seq_no=23*//*!*/; BEGIN which matches the information for domain 2 in gtid_current_pos / gtid_slave_pos which shows GTID 2-1-22 as last applied for this domain. I don't fully understand the code behind this yet, but my blind educated guess is that the last applied transaction GTID 2-1-22 is actually being searched for first, and only then 2-1-23, and as the beginning event of transaction 2-1-22 was in the previous, already purged, binlog hartmut-3510-relay-bin.000037, it can't be found in the current binlog and so the position is never set correctly, instead just remains at the beginning of .000038, and replication wrongfully continues from there. (Or at least mysqlbinlog doesn't show any trace of GTID 2-1-22, not sure whether its output is complete in that respect though)

            This is a summary I wrote elsewhere last week. Unfortunately recent tests have shown that the proposed cause in the first paragraph can't be the complete story. More about this in the next comment ...

            — Original Text —

            When a Transactions is being spread across a relay log border
            the parallel replication relay log repositioning logic may end
            up positioning into the middle of the transaction, and GTID
            "we've already processed this transaction" logic doesn't kick
            in as there's no GTID information at the beginning of the
            relay log file, only at the beginning of the next transaction.

            Now, as far as I can tell, we have two ways to solve this:

            1) Change relay log copying logic so that transactions will
            never be split up, similar to how things are dealt with
            on the master side with binlogs. With this max relay log
            size will no longer be a strict limit and relay logs may
            become as big as needed for large transactions. This doesn't
            cause any new file system space issues though as right now
            all relay log parts that cover a large transactions can't be
            purged until transaction commit anyway.

            It will require a) some extra logic in the IO thread to be
            transaction aware though, especially to deal with FLUSH
            RELAY LOGS, or b) alternatively to copy binlogs to relay logs
            on a 1:1 file by file basis, totally ignoring FLUSH requests.

            2) Change repositioning logic. Right now it simply repositions
            to the beginning of the first not completely processed relay
            log, assuming that already processed changes will be filtered
            out and not re-run due to GTID "already seen" logic, which fails
            for events that apparently are not given a GTID (as it was already
            given in the previous relay log file)

            So by extending the repositioning logic to not simply rewind to
            the beginning of the relay log, but to then roll forward over
            events until seeing a new transaction start with explicit GTID.

            Then take the position of this event as the new start position.

            3) Change GTID filtering logic to completely ignore binlog/relaylog
            events for which the GTID is not known, either only in parallel
            mode, or maybe even completely

            hholzgra Hartmut Holzgraefe added a comment - This is a summary I wrote elsewhere last week. Unfortunately recent tests have shown that the proposed cause in the first paragraph can't be the complete story. More about this in the next comment ... — Original Text — When a Transactions is being spread across a relay log border the parallel replication relay log repositioning logic may end up positioning into the middle of the transaction, and GTID "we've already processed this transaction" logic doesn't kick in as there's no GTID information at the beginning of the relay log file, only at the beginning of the next transaction. Now, as far as I can tell, we have two ways to solve this: 1) Change relay log copying logic so that transactions will never be split up, similar to how things are dealt with on the master side with binlogs. With this max relay log size will no longer be a strict limit and relay logs may become as big as needed for large transactions. This doesn't cause any new file system space issues though as right now all relay log parts that cover a large transactions can't be purged until transaction commit anyway. It will require a) some extra logic in the IO thread to be transaction aware though, especially to deal with FLUSH RELAY LOGS, or b) alternatively to copy binlogs to relay logs on a 1:1 file by file basis, totally ignoring FLUSH requests. 2) Change repositioning logic. Right now it simply repositions to the beginning of the first not completely processed relay log, assuming that already processed changes will be filtered out and not re-run due to GTID "already seen" logic, which fails for events that apparently are not given a GTID (as it was already given in the previous relay log file) So by extending the repositioning logic to not simply rewind to the beginning of the relay log, but to then roll forward over events until seeing a new transaction start with explicit GTID. Then take the position of this event as the new start position. 3) Change GTID filtering logic to completely ignore binlog/relaylog events for which the GTID is not known, either only in parallel mode, or maybe even completely

            Further testing has shown that shown that the "how to reproduce" instructions from above only lead to duplicate key errors on sufficiently capable hardware. E.g. I can reproduce this very quickly when replicating from my desktop (6 cores) to my laptop (4 cores), or when replicating between my two public web servers (8 cores each – or actually 4 real ones plus hyper threading),

            When trying the same steps, but using two Virtualbox VMs as master and slave on my 6 core machine, and giving each VM 3 virtual cores, the test can run for hours though without failing at all.

            So this doesn't only seem to be about relay log splitting at the "wrong" place, but there also seems to be some race condition that needs to be hit ...

            hholzgra Hartmut Holzgraefe added a comment - Further testing has shown that shown that the "how to reproduce" instructions from above only lead to duplicate key errors on sufficiently capable hardware. E.g. I can reproduce this very quickly when replicating from my desktop (6 cores) to my laptop (4 cores), or when replicating between my two public web servers (8 cores each – or actually 4 real ones plus hyper threading), When trying the same steps, but using two Virtualbox VMs as master and slave on my 6 core machine, and giving each VM 3 virtual cores, the test can run for hours though without failing at all. So this doesn't only seem to be about relay log splitting at the "wrong" place, but there also seems to be some race condition that needs to be hit ...

            Ouch, as Hartmut says this is the following code in handle_slave_sql():

              /*
                Load the starting GTID position, so that we can skip already applied
                GTIDs when we restart the SQL thread. And set the start position in
                the relay log back to a known safe place to start (prior to any not
                yet applied transaction in any domain).
              */
              rli->restart_gtid_pos.load(rpl_global_gtid_slave_state, NULL, 0);
              if ((ir= rli->inuse_relaylog_list))
              {
                rpl_gtid *gtid= ir->relay_log_state;
                uint32 count= ir->relay_log_state_count;
                while (count > 0)
                {
                  process_gtid_for_restart_pos(rli, gtid);
                  ++gtid;
                  --count;
                }
                strmake_buf(rli->group_relay_log_name, ir->name);
                rli->group_relay_log_pos= BIN_LOG_HEADER_SIZE;
                rli->relay_log_state.load(ir->relay_log_state, ir->relay_log_state_count);
              }
            

            So it always restarts at the beginning of a relay log, which can easily be
            in the middle of an event group. That must be just plainly wrong.

            Probably proposed solution (3) should be the right fix. It would be nice to
            set a better restart position that does not point inside an event group, but
            I am not sure we have that information available. There might not even be
            any GTID event at all in a given relay log file.

            knielsen Kristian Nielsen added a comment - Ouch, as Hartmut says this is the following code in handle_slave_sql(): /* Load the starting GTID position, so that we can skip already applied GTIDs when we restart the SQL thread. And set the start position in the relay log back to a known safe place to start (prior to any not yet applied transaction in any domain). */ rli->restart_gtid_pos.load(rpl_global_gtid_slave_state, NULL, 0); if ((ir= rli->inuse_relaylog_list)) { rpl_gtid *gtid= ir->relay_log_state; uint32 count= ir->relay_log_state_count; while (count > 0) { process_gtid_for_restart_pos(rli, gtid); ++gtid; --count; } strmake_buf(rli->group_relay_log_name, ir->name); rli->group_relay_log_pos= BIN_LOG_HEADER_SIZE; rli->relay_log_state.load(ir->relay_log_state, ir->relay_log_state_count); } So it always restarts at the beginning of a relay log, which can easily be in the middle of an event group. That must be just plainly wrong. Probably proposed solution (3) should be the right fix. It would be nice to set a better restart position that does not point inside an event group, but I am not sure we have that information available. There might not even be any GTID event at all in a given relay log file.

            Kristian: do you have an idea though why this only seems to be reproducible on sufficiently powerful hardware (e.g. only when replicating between real machines, not between Virtualbox VMs?)

            If it were only about relay log being split in the middle of an event group I'd expect it to happen regardless of "machine power", and just being related to event group sizes vs. max relaylog size?

            hholzgra Hartmut Holzgraefe added a comment - Kristian: do you have an idea though why this only seems to be reproducible on sufficiently powerful hardware (e.g. only when replicating between real machines, not between Virtualbox VMs?) If it were only about relay log being split in the middle of an event group I'd expect it to happen regardless of "machine power", and just being related to event group sizes vs. max relaylog size?

            It's hard to say why you could not reproduce, your test setup reproduced
            easily for me just on my laptop.

            Here is a mysql-test-run testcase that seems to reproduce it reliably.

            • Kristian.

            --source include/have_innodb.inc
            --let $rpl_topology=1->2
            --source include/rpl_init.inc
             
            # Test various aspects of parallel replication.
             
            --connection server_2
            SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
            --source include/stop_slave.inc
            SET GLOBAL slave_parallel_threads=10;
            SET @old_max_relay= @@GLOBAL.max_relay_log_size;
            SET GLOBAL max_relay_log_size = 4096;
            CHANGE MASTER TO master_use_gtid=slave_pos;
            --source include/start_slave.inc
             
            --connection server_1
            ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
            CREATE TABLE t1 (a int PRIMARY KEY, b VARCHAR(100)) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (1, "a");
            --save_master_pos
             
            --connection server_2
            --sync_with_master
             
            --echo *** Create a long transaction that will span a relay log file. ***
            --connection server_1
             
            # Add some transactions in separate domains, that will cause the need to
            # have a multi-valued restart position in the relay log for the SQL thread.
            SET @old_domain= @@gtid_domain_id;
            SET gtid_domain_id=10;
            INSERT INTO t1 VALUES (10000, "domain 10");
            SET gtid_domain_id=20;
            INSERT INTO t1 VALUES (20000, "domain 20");
            SET gtid_domain_id=@old_domain;
             
            BEGIN;
            --echo [lots of inserts omitted]
            --disable_query_log
            --let $count = 500
            while ($count) {
              eval INSERT INTO t1 VALUES (1000+$count, REPEAT("hulubulu??!?", 8));
              dec $count;
            }
            --enable_query_log
            COMMIT;
             
            --save_master_pos
             
            --connection server_2
            --sync_with_master
             
            --connection server_1
            # Now do another one, to make the inuse_relaylog proceed to somewhere inside
            # the first large transaction.
             
            BEGIN;
            --echo [lots of inserts omitted]
            --disable_query_log
            --let $count = 500
            while ($count) {
              eval INSERT INTO t1 VALUES (2000+$count, REPEAT("hulubulu??!?", 8));
              dec $count;
            }
            --enable_query_log
            COMMIT;
             
            --save_master_pos
             
            --connection server_2
            --sync_with_master
             
             
            # Stop and restart the SQL thread only.
            # The bug was that the SQL thread would restart at the start
            # of a relay log file, which could be in the middle of an event group.
            # This way, part of that event group could be wrongly re-applied.
             
            --source include/stop_slave_sql.inc
            START SLAVE SQL_THREAD;
            --source include/wait_for_slave_to_start.inc
             
             
            --connection server_1
            INSERT INTO t1 VALUES (100001, "More stuffs.");
            INSERT INTO t1 VALUES (100002, "And even more");
            --save_master_pos
             
            --connection server_2
            --sync_with_master
            SELECT * FROM t1 WHERE a >= 1000000 ORDER BY a;
             
             
            # Clean up.
            --connection server_2
            --source include/stop_slave.inc
            SET GLOBAL slave_parallel_threads=@old_parallel_threads;
            SET GLOBAL max_relay_log_size= @old_max_relay;
            --source include/start_slave.inc
             
            --connection server_1
            DROP TABLE t1;
             
            --source include/rpl_end.inc
            

            knielsen Kristian Nielsen added a comment - It's hard to say why you could not reproduce, your test setup reproduced easily for me just on my laptop. Here is a mysql-test-run testcase that seems to reproduce it reliably. Kristian. --source include/have_innodb.inc --let $rpl_topology=1->2 --source include/rpl_init.inc   # Test various aspects of parallel replication.   --connection server_2 SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads; --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=10; SET @old_max_relay= @@GLOBAL.max_relay_log_size; SET GLOBAL max_relay_log_size = 4096; CHANGE MASTER TO master_use_gtid=slave_pos; --source include/start_slave.inc   --connection server_1 ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; CREATE TABLE t1 (a int PRIMARY KEY, b VARCHAR(100)) ENGINE=InnoDB; INSERT INTO t1 VALUES (1, "a"); --save_master_pos   --connection server_2 --sync_with_master   --echo *** Create a long transaction that will span a relay log file. *** --connection server_1   # Add some transactions in separate domains, that will cause the need to # have a multi-valued restart position in the relay log for the SQL thread. SET @old_domain= @@gtid_domain_id; SET gtid_domain_id=10; INSERT INTO t1 VALUES (10000, "domain 10"); SET gtid_domain_id=20; INSERT INTO t1 VALUES (20000, "domain 20"); SET gtid_domain_id=@old_domain;   BEGIN; --echo [lots of inserts omitted] --disable_query_log --let $count = 500 while ($count) { eval INSERT INTO t1 VALUES (1000+$count, REPEAT("hulubulu??!?", 8)); dec $count; } --enable_query_log COMMIT;   --save_master_pos   --connection server_2 --sync_with_master   --connection server_1 # Now do another one, to make the inuse_relaylog proceed to somewhere inside # the first large transaction.   BEGIN; --echo [lots of inserts omitted] --disable_query_log --let $count = 500 while ($count) { eval INSERT INTO t1 VALUES (2000+$count, REPEAT("hulubulu??!?", 8)); dec $count; } --enable_query_log COMMIT;   --save_master_pos   --connection server_2 --sync_with_master     # Stop and restart the SQL thread only. # The bug was that the SQL thread would restart at the start # of a relay log file, which could be in the middle of an event group. # This way, part of that event group could be wrongly re-applied.   --source include/stop_slave_sql.inc START SLAVE SQL_THREAD; --source include/wait_for_slave_to_start.inc     --connection server_1 INSERT INTO t1 VALUES (100001, "More stuffs."); INSERT INTO t1 VALUES (100002, "And even more"); --save_master_pos   --connection server_2 --sync_with_master SELECT * FROM t1 WHERE a >= 1000000 ORDER BY a;     # Clean up. --connection server_2 --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=@old_parallel_threads; SET GLOBAL max_relay_log_size= @old_max_relay; --source include/start_slave.inc   --connection server_1 DROP TABLE t1;   --source include/rpl_end.inc

            Fix:

            http://lists.askmonty.org/pipermail/commits/2016-November/010061.html

            Lixun, do you want to review the patch before I push it into 10.0?

            • Kristian.
            knielsen Kristian Nielsen added a comment - Fix: http://lists.askmonty.org/pipermail/commits/2016-November/010061.html Lixun, do you want to review the patch before I push it into 10.0? Kristian.

            Looks good, I ran my test setup for almost two days without any replication breaks

            hholzgra Hartmut Holzgraefe added a comment - Looks good, I ran my test setup for almost two days without any replication breaks
            plinux Lixun Peng added a comment -

            knielsen
            I checked your patch, the main idea is ignore the first part without a begining GTID event in the first relay log, right?

            I'm not sure if this logic always be right under parallels replication.

            Do you think keep the previous relay log file is a better way?
            I mean, if Event[0] be splitted into realy log File[0] and File[1], and GTID moved to Evnet[1], should we keep the File[0] until Evnet[1] be applied? Then parallels replication can always get the start postion in the exist relay log file.

            plinux Lixun Peng added a comment - knielsen I checked your patch, the main idea is ignore the first part without a begining GTID event in the first relay log, right? I'm not sure if this logic always be right under parallels replication. Do you think keep the previous relay log file is a better way? I mean, if Event [0] be splitted into realy log File [0] and File [1] , and GTID moved to Evnet [1] , should we keep the File [0] until Evnet [1] be applied? Then parallels replication can always get the start postion in the exist relay log file.

            > I checked your patch, the main idea is ignore the first part without a
            > begining GTID event in the first relay log, right?

            The way the code works is, it scans over the earlier GTIDs in the relay log
            file. If the GTID was already applied, it sets the GTID_SKIP_TRANSACTION (or
            GTID_SKIP_STANDALONE) flag, so that the associated events are skipped.

            But the code starts scanning at the beginning of a relay log, which can be
            in the middle of an event group that was already applied. So it does not see
            the GTID for the first event group, thus the gtid_skip_flag needs to be
            explicitly initialised to GTID_SKIP_TRANSACTION.

            We can be sure that any GTID from an earlier relay log was already applied,
            so that skipping it is correct. This is ensured by the inuse_relaylog
            mechanism. See struct inuse_relaylog in sql/rpl_rli.h.

            An alternative might be to keep track of the position of the first GTID (or
            other event outside of an event group), and then start scanning from that
            point. Then we need to handle also the case where a relay log file contains
            no GTID at all because both the start and the end of a big transaction is
            outside that file. I am not sure that is any simpler or better.

            > I'm not sure if this logic always be right under parallels replication.

            Yes. This code which handles restarting the SQL thread is really
            complicated, and I was always very unhappy about it. There have been a
            number of bugs related to it. I would certainly welcome any suggestions for
            a simpler solution.

            I think the better overall approach would be that the SQL thread was able to
            correctly start in the relay log file from a GTID position, using a
            mechanism similar to how this works when mysql_binlog_send() runs on the
            master. Then, if relay log recovery was implemented on the slave similar to
            binlog recovery on the master, we would also be able to re-use the relay log
            after a slave server restart, without having to re-fetch it from the
            master. But that is a much bigger change, probably too much for 10.0.

            (And it would still be necessary to handle partiel event groups at the start
            of the relay log).

            > Do you think keep the previous relay log file is a better way? I mean, if
            > Event[0] be splitted into realy log File[0] and File[1], and GTID moved to
            > Evnet[1], should we keep the File[0] until Evnet[1] be applied? Then
            > parallels replication can always get the start postion in the exist relay
            > log file.

            I don't see how this will help. There might still be another earlier
            Event[-1] in File[0] that was split and does not have its beginning GTID in
            File[0]. No matter what, I think we need to be able to handle the
            possibility that there is a partial event group at the start of a relay log
            file without the starting GTID.

            Thanks,

            • Kristian.
            knielsen Kristian Nielsen added a comment - > I checked your patch, the main idea is ignore the first part without a > begining GTID event in the first relay log, right? The way the code works is, it scans over the earlier GTIDs in the relay log file. If the GTID was already applied, it sets the GTID_SKIP_TRANSACTION (or GTID_SKIP_STANDALONE) flag, so that the associated events are skipped. But the code starts scanning at the beginning of a relay log, which can be in the middle of an event group that was already applied. So it does not see the GTID for the first event group, thus the gtid_skip_flag needs to be explicitly initialised to GTID_SKIP_TRANSACTION. We can be sure that any GTID from an earlier relay log was already applied, so that skipping it is correct. This is ensured by the inuse_relaylog mechanism. See struct inuse_relaylog in sql/rpl_rli.h. An alternative might be to keep track of the position of the first GTID (or other event outside of an event group), and then start scanning from that point. Then we need to handle also the case where a relay log file contains no GTID at all because both the start and the end of a big transaction is outside that file. I am not sure that is any simpler or better. > I'm not sure if this logic always be right under parallels replication. Yes. This code which handles restarting the SQL thread is really complicated, and I was always very unhappy about it. There have been a number of bugs related to it. I would certainly welcome any suggestions for a simpler solution. I think the better overall approach would be that the SQL thread was able to correctly start in the relay log file from a GTID position, using a mechanism similar to how this works when mysql_binlog_send() runs on the master. Then, if relay log recovery was implemented on the slave similar to binlog recovery on the master, we would also be able to re-use the relay log after a slave server restart, without having to re-fetch it from the master. But that is a much bigger change, probably too much for 10.0. (And it would still be necessary to handle partiel event groups at the start of the relay log). > Do you think keep the previous relay log file is a better way? I mean, if > Event [0] be splitted into realy log File [0] and File [1] , and GTID moved to > Evnet [1] , should we keep the File [0] until Evnet [1] be applied? Then > parallels replication can always get the start postion in the exist relay > log file. I don't see how this will help. There might still be another earlier Event [-1] in File [0] that was split and does not have its beginning GTID in File [0] . No matter what, I think we need to be able to handle the possibility that there is a partial event group at the start of a relay log file without the starting GTID. Thanks, Kristian.
            plinux Lixun Peng added a comment -

            Hi Kritian,

            I think you can merge your code into the master branch.

            Thanks,
            Lixun

            plinux Lixun Peng added a comment - Hi Kritian, I think you can merge your code into the master branch. Thanks, Lixun

            > I think you can merge your code into the master branch.

            Ok, will do.

            • Kristian.
            knielsen Kristian Nielsen added a comment - > I think you can merge your code into the master branch. Ok, will do. Kristian.

            Pushed to 10.0.

            Thanks to Hartmut for an excellent bug report. Detailed analysis, and a
            stand-alone testcase to reproduce a complex failure scenario.

            knielsen Kristian Nielsen added a comment - Pushed to 10.0. Thanks to Hartmut for an excellent bug report. Detailed analysis, and a stand-alone testcase to reproduce a complex failure scenario.

            Actually most of it was originally done by Jean-François, the only part really done by me was the PHP generator script that made hitting the situation described by J-F much more likely ...

            hholzgra Hartmut Holzgraefe added a comment - Actually most of it was originally done by Jean-François, the only part really done by me was the PHP generator script that made hitting the situation described by J-F much more likely ...

            People

              knielsen Kristian Nielsen
              hholzgra Hartmut Holzgraefe
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.