Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33578

Replica severely lags on XA ROLLBACK when deleting rows from a non-transactional table while another table receives an INSERT inside the same XA transaction

Details

    Description

      # Use RBR, standard m/s setup
      CREATE TABLE t1 (c INT) ENGINE=MyISAM;  # Or MEMORY,Aria
      CREATE TABLE t2 (c INT);  # Engine choice does not matter
      INSERT t2 SELECT SEQ FROM seq_1_to_200000;
      XA START 'x';
      DELETE FROM t2;
      INSERT INTO t1 VALUES(1);
      XA END 'x';
      XA ROLLBACK 'x';
      SHOW WARNINGS;  # Warning: 1196: Some non-transactional changed tables couldn't be rolled back
      

      Leads to:

      11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)

                    Slave_IO_Running: Yes
                   Slave_SQL_Running: Yes
      ...
               Seconds_Behind_Master: 1213
      ...
             Slave_SQL_Running_State: Waiting for room in worker thread event queue
      

      And

      11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)

      2024-03-02  8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
      

      The DELETE application continues for 20+ minutes: (after which I stopped it)

      11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)

      11.4.0-opt>SHOW PROCESSLIST;
      +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
      | Id | User        | Host      | db   | Command      | Time | State                                                  | Info             | Progress |
      +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
      |  6 | system user |           | NULL | Slave_IO     | 1219 | Waiting for master to send event                       | NULL             |    0.000 |
      |  9 | system user |           | NULL | Slave_worker | 1215 | Waiting for work from SQL thread                       | NULL             |    0.000 |
      |  8 | system user |           | NULL | Slave_worker | 1219 | Waiting for work from SQL thread                       | NULL             |    0.000 |
      | 10 | system user |           | NULL | Slave_worker | 1215 | Waiting for work from SQL thread                       | NULL             |    0.000 |
      | 11 | system user |           | test | Slave_worker |   21 | Delete_rows_log_event::find_row(-1) on table `t2`      | NULL             |    0.000 |
      |  7 | system user |           | NULL | Slave_SQL    | 1219 | Slave has read all relay log; waiting for more updates | NULL             |    0.000 |
      | 12 | root        | localhost | test | Query        |    0 | starting                                               | SHOW PROCESSLIST |    0.000 |
      +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
      7 rows in set (0.000 sec)
      

      Findings and questions:

      • The Delete_rows_log_event::find_row(-1) time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
      • While adding a PRIMARY KEY indeed helps with the DELETE for t2 (no wait), the issue does not happen when the INSERT INTO t1 VALUES(1); is not present even without adding a key - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
      • Why does the XA rollback need to fail on the master (leaving 200k records in t2, even when it is InnoDB)?
      • There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the DELETE could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).

      Attachments

        Activity

          Roel Roel Van de Paar created issue -
          Roel Roel Van de Paar made changes -
          Field Original Value New Value
          Description {code:sql}
          # Use RBR, standard m/s setup
          CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria
          CREATE TABLE t2 (c INT); # Engine choice does not matter
          INSERT t2 SELECT SEQ FROM seq_1_to_200000;
          XA START 'x';
          DELETE FROM t2;
          INSERT INTO t1 VALUES(1);
          XA END 'x';
          XA ROLLBACK 'x';
          SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back
          {code}
          Leads to:
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
                        Slave_IO_Running: Yes
                       Slave_SQL_Running: Yes
          ...
                   Seconds_Behind_Master: 1213
          ...
                 Slave_SQL_Running_State: Waiting for room in worker thread event queue
          {noformat}
          And
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
          {noformat}
          The {{DELETE}} application continues for 20+ minutes: (after which I stopped it)
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          11.4.0-opt>SHOW PROCESSLIST;
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | Id | User | Host | db | Command | Time | State | Info | Progress |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
          | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
          | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
          | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
          | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          7 rows in set (0.000 sec)
          {noformat}
          Findings:
          * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
          * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
          * On a sidenote, why does the XA rollback need to fail on the master (leaving 200k records in t2)?
          * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).
          {code:sql}
          # Use RBR, standard m/s setup
          CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria
          CREATE TABLE t2 (c INT); # Engine choice does not matter
          INSERT t2 SELECT SEQ FROM seq_1_to_200000;
          XA START 'x';
          DELETE FROM t2;
          INSERT INTO t1 VALUES(1);
          XA END 'x';
          XA ROLLBACK 'x';
          SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back
          {code}
          Leads to:
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
                        Slave_IO_Running: Yes
                       Slave_SQL_Running: Yes
          ...
                   Seconds_Behind_Master: 1213
          ...
                 Slave_SQL_Running_State: Waiting for room in worker thread event queue
          {noformat}
          And
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
          {noformat}
          The {{DELETE}} application continues for 20+ minutes: (after which I stopped it)
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          11.4.0-opt>SHOW PROCESSLIST;
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | Id | User | Host | db | Command | Time | State | Info | Progress |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
          | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
          | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
          | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
          | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          7 rows in set (0.000 sec)
          {noformat}
          Findings:
          * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
          * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
          * Why does the XA rollback need to fail on the master (leaving 200k records in t2)? The warning is also not correct as the same warning happens when t2 is InnoDB.
          * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).
          Roel Roel Van de Paar added a comment - - edited

          MTR Testcase

          -- source include/master-slave.inc
          -- source include/have_innodb.inc
          -- source include/have_sequence.inc
          CREATE TABLE t1 (c INT) ENGINE=MyISAM;  # Or MEMORY,Aria
          CREATE TABLE t2 (c INT) ENGINE=InnoDB;  # Engine choice does not matter
          INSERT t2 SELECT SEQ FROM seq_1_to_200000;
          XA START 'x';
          DELETE FROM t2;
          INSERT INTO t1 VALUES(1);
          XA END 'x';
          XA ROLLBACK 'x';  # Warning: 1196: Some non-transactional changed tables couldn't be rolled back
          --sync_slave_with_master
          --source include/rpl_end.inc
          

          It will show [pass] but then semi-hang, and terminate on the --sync_slave_with_master after 300 seconds:

          11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)

          CURRENT_TEST: main.test
          analyze: sync_with_master
          mysqltest: At line 13: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 2009383, 300, '')' returned -1 indicating timeout after 300 seconds
          

          CLI to MTR replication slave before MTR terminates:

          11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)

          $ /test/MD060224-mariadb-11.4.0-linux-x86_64-opt/bin/mariadb -uroot -P16001
          WARNING: option --ssl-verify-server-cert is disabled, because of an insecure passwordless login.
          Welcome to the MariaDB monitor.  Commands end with ; or \g.
          Your MariaDB connection id is 11
          Server version: 11.4.0-MariaDB-log MariaDB Server
           
          Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
           
          Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
           
          MariaDB [(none)]> SHOW PROCESSLIST;
          +----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
          | Id | User        | Host            | db   | Command   | Time | State                                                | Info                                                          | Progress |
          +----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
          |  5 | root        | localhost:46154 | test | Sleep     |  260 |                                                      | NULL                                                          |    0.000 |
          |  6 | system user |                 | NULL | Slave_IO  |  260 | Waiting for master to send event                     | NULL                                                          |    0.000 |
          |  7 | system user |                 | test | Slave_SQL |   17 | Delete_rows_log_event::find_row(-1) on table `t2`    | DELETE FROM t2                                                |    0.000 |
          |  8 | root        | localhost:46166 | test | Query     |  256 | Waiting for the slave SQL thread to advance position | select master_pos_wait('master-bin.000001', 2009383, 300, '') |    0.000 |
          |  9 | root        | localhost:46174 | test | Sleep     |  260 |                                                      | NULL                                                          |    0.000 |
          | 11 | root        | localhost:21434 | NULL | Query     |    0 | starting                                             | SHOW PROCESSLIST                                              |    0.000 |
          +----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
          6 rows in set (0.000 sec)
          

          Roel Roel Van de Paar added a comment - - edited MTR Testcase -- source include/master-slave.inc -- source include/have_innodb.inc -- source include/have_sequence.inc CREATE TABLE t1 (c INT ) ENGINE=MyISAM; # Or MEMORY,Aria CREATE TABLE t2 (c INT ) ENGINE=InnoDB; # Engine choice does not matter INSERT t2 SELECT SEQ FROM seq_1_to_200000; XA START 'x' ; DELETE FROM t2; INSERT INTO t1 VALUES (1); XA END 'x' ; XA ROLLBACK 'x' ; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back --sync_slave_with_master --source include/rpl_end.inc It will show [pass] but then semi-hang, and terminate on the --sync_slave_with_master after 300 seconds: 11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) CURRENT_TEST: main.test analyze: sync_with_master mysqltest: At line 13: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 2009383, 300, '')' returned -1 indicating timeout after 300 seconds CLI to MTR replication slave before MTR terminates: 11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) $ /test/MD060224-mariadb-11.4.0-linux-x86_64-opt/bin/mariadb -uroot -P16001 WARNING: option --ssl-verify-server-cert is disabled, because of an insecure passwordless login. Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 11 Server version: 11.4.0-MariaDB-log MariaDB Server   Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   MariaDB [(none)]> SHOW PROCESSLIST; +----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+ | 5 | root | localhost:46154 | test | Sleep | 260 | | NULL | 0.000 | | 6 | system user | | NULL | Slave_IO | 260 | Waiting for master to send event | NULL | 0.000 | | 7 | system user | | test | Slave_SQL | 17 | Delete_rows_log_event::find_row(-1) on table `t2` | DELETE FROM t2 | 0.000 | | 8 | root | localhost:46166 | test | Query | 256 | Waiting for the slave SQL thread to advance position | select master_pos_wait('master-bin.000001', 2009383, 300, '') | 0.000 | | 9 | root | localhost:46174 | test | Sleep | 260 | | NULL | 0.000 | | 11 | root | localhost:21434 | NULL | Query | 0 | starting | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+ 6 rows in set (0.000 sec)
          Roel Roel Van de Paar made changes -
          Description {code:sql}
          # Use RBR, standard m/s setup
          CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria
          CREATE TABLE t2 (c INT); # Engine choice does not matter
          INSERT t2 SELECT SEQ FROM seq_1_to_200000;
          XA START 'x';
          DELETE FROM t2;
          INSERT INTO t1 VALUES(1);
          XA END 'x';
          XA ROLLBACK 'x';
          SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back
          {code}
          Leads to:
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
                        Slave_IO_Running: Yes
                       Slave_SQL_Running: Yes
          ...
                   Seconds_Behind_Master: 1213
          ...
                 Slave_SQL_Running_State: Waiting for room in worker thread event queue
          {noformat}
          And
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
          {noformat}
          The {{DELETE}} application continues for 20+ minutes: (after which I stopped it)
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          11.4.0-opt>SHOW PROCESSLIST;
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | Id | User | Host | db | Command | Time | State | Info | Progress |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
          | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
          | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
          | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
          | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          7 rows in set (0.000 sec)
          {noformat}
          Findings:
          * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
          * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
          * Why does the XA rollback need to fail on the master (leaving 200k records in t2)? The warning is also not correct as the same warning happens when t2 is InnoDB.
          * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).
          {code:sql}
          # Use RBR, standard m/s setup
          CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria
          CREATE TABLE t2 (c INT); # Engine choice does not matter
          INSERT t2 SELECT SEQ FROM seq_1_to_200000;
          XA START 'x';
          DELETE FROM t2;
          INSERT INTO t1 VALUES(1);
          XA END 'x';
          XA ROLLBACK 'x';
          SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back
          {code}
          Leads to:
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
                        Slave_IO_Running: Yes
                       Slave_SQL_Running: Yes
          ...
                   Seconds_Behind_Master: 1213
          ...
                 Slave_SQL_Running_State: Waiting for room in worker thread event queue
          {noformat}
          And
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
          {noformat}
          The {{DELETE}} application continues for 20+ minutes: (after which I stopped it)
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          11.4.0-opt>SHOW PROCESSLIST;
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | Id | User | Host | db | Command | Time | State | Info | Progress |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
          | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
          | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
          | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
          | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          7 rows in set (0.000 sec)
          {noformat}
          Findings and questions:
          * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
          * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
          * Why does the XA rollback need to fail on the master (leaving 200k records in t2)? The warning is also not correct as the same warning happens when t2 is InnoDB.
          * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).
          Roel Roel Van de Paar made changes -
          Labels performance replication RBR performance replication
          Roel Roel Van de Paar made changes -
          Fix Version/s 10.4 [ 22408 ]
          Fix Version/s 10.5 [ 23123 ]
          Fix Version/s 10.6 [ 24028 ]
          Fix Version/s 10.11 [ 27614 ]
          Fix Version/s 11.0 [ 28320 ]
          Fix Version/s 11.1 [ 28549 ]
          Fix Version/s 11.2 [ 28603 ]
          Fix Version/s 11.3 [ 28565 ]
          Affects Version/s 10.4 [ 22408 ]
          Affects Version/s 10.5 [ 23123 ]
          Affects Version/s 10.6 [ 24028 ]
          Affects Version/s 10.11 [ 27614 ]
          Affects Version/s 11.0 [ 28320 ]
          Affects Version/s 11.1 [ 28549 ]
          Affects Version/s 11.2 [ 28603 ]
          Affects Version/s 11.3 [ 28565 ]
          Affects Version/s 11.5 [ 29506 ]
          Roel Roel Van de Paar made changes -
          Description {code:sql}
          # Use RBR, standard m/s setup
          CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria
          CREATE TABLE t2 (c INT); # Engine choice does not matter
          INSERT t2 SELECT SEQ FROM seq_1_to_200000;
          XA START 'x';
          DELETE FROM t2;
          INSERT INTO t1 VALUES(1);
          XA END 'x';
          XA ROLLBACK 'x';
          SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back
          {code}
          Leads to:
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
                        Slave_IO_Running: Yes
                       Slave_SQL_Running: Yes
          ...
                   Seconds_Behind_Master: 1213
          ...
                 Slave_SQL_Running_State: Waiting for room in worker thread event queue
          {noformat}
          And
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
          {noformat}
          The {{DELETE}} application continues for 20+ minutes: (after which I stopped it)
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          11.4.0-opt>SHOW PROCESSLIST;
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | Id | User | Host | db | Command | Time | State | Info | Progress |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
          | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
          | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
          | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
          | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          7 rows in set (0.000 sec)
          {noformat}
          Findings and questions:
          * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
          * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
          * Why does the XA rollback need to fail on the master (leaving 200k records in t2)? The warning is also not correct as the same warning happens when t2 is InnoDB.
          * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).
          {code:sql}
          # Use RBR, standard m/s setup
          CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria
          CREATE TABLE t2 (c INT); # Engine choice does not matter
          INSERT t2 SELECT SEQ FROM seq_1_to_200000;
          XA START 'x';
          DELETE FROM t2;
          INSERT INTO t1 VALUES(1);
          XA END 'x';
          XA ROLLBACK 'x';
          SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back
          {code}
          Leads to:
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
                        Slave_IO_Running: Yes
                       Slave_SQL_Running: Yes
          ...
                   Seconds_Behind_Master: 1213
          ...
                 Slave_SQL_Running_State: Waiting for room in worker thread event queue
          {noformat}
          And
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
          {noformat}
          The {{DELETE}} application continues for 20+ minutes: (after which I stopped it)
          {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)}
          11.4.0-opt>SHOW PROCESSLIST;
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | Id | User | Host | db | Command | Time | State | Info | Progress |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
          | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
          | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
          | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
          | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
          | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
          +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
          7 rows in set (0.000 sec)
          {noformat}
          Findings and questions:
          * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
          * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
          * Why does the XA rollback need to fail on the master (leaving 200k records in t2, even when it is InnoDB)?
          * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).
          Roel Roel Van de Paar made changes -
          Summary Replica severely lags on XA rollback when deleting rows from a non-transactional table while another table receives an INSERT inside the same XA transaction Replica severely lags on XA ROLLBACK when deleting rows from a non-transactional table while another table receives an INSERT inside the same XA transaction
          Roel Roel Van de Paar made changes -
          julien.fritsch Julien Fritsch made changes -
          Fix Version/s 11.0 [ 28320 ]
          Fix Version/s 11.3 [ 28565 ]
          julien.fritsch Julien Fritsch made changes -
          Fix Version/s 10.4 [ 22408 ]
          julien.fritsch Julien Fritsch made changes -
          Fix Version/s 11.1 [ 28549 ]
          julien.fritsch Julien Fritsch made changes -
          Fix Version/s 11.2(EOL) [ 28603 ]

          People

            Elkin Andrei Elkin
            Roel Roel Van de Paar
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.