Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.4(EOL), 10.5, 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4, 11.5(EOL)
Description
# Use RBR, standard m/s setup |
CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria |
CREATE TABLE t2 (c INT); # Engine choice does not matter |
INSERT t2 SELECT SEQ FROM seq_1_to_200000; |
XA START 'x'; |
DELETE FROM t2; |
INSERT INTO t1 VALUES(1); |
XA END 'x'; |
XA ROLLBACK 'x'; |
SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back |
Leads to:
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
...
|
Seconds_Behind_Master: 1213
|
...
|
Slave_SQL_Running_State: Waiting for room in worker thread event queue
|
And
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
|
The DELETE application continues for 20+ minutes: (after which I stopped it)
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
11.4.0-opt>SHOW PROCESSLIST;
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
| 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
|
| 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
|
| 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
|
| 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
7 rows in set (0.000 sec)
|
Findings and questions:
- The Delete_rows_log_event::find_row(-1) time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
- While adding a PRIMARY KEY indeed helps with the DELETE for t2 (no wait), the issue does not happen when the INSERT INTO t1 VALUES(1); is not present even without adding a key - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
- Why does the XA rollback need to fail on the master (leaving 200k records in t2, even when it is InnoDB)?
- There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the DELETE could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Description |
{code:sql}
# Use RBR, standard m/s setup CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria CREATE TABLE t2 (c INT); # Engine choice does not matter INSERT t2 SELECT SEQ FROM seq_1_to_200000; XA START 'x'; DELETE FROM t2; INSERT INTO t1 VALUES(1); XA END 'x'; XA ROLLBACK 'x'; SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back {code} Leads to: {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Seconds_Behind_Master: 1213 ... Slave_SQL_Running_State: Waiting for room in worker thread event queue {noformat} And {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance. {noformat} The {{DELETE}} application continues for 20+ minutes: (after which I stopped it) {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 11.4.0-opt>SHOW PROCESSLIST; +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 | | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 | | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 | | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 | | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ 7 rows in set (0.000 sec) {noformat} Findings: * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?) * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT). * On a sidenote, why does the XA rollback need to fail on the master (leaving 200k records in t2)? * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related). |
{code:sql}
# Use RBR, standard m/s setup CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria CREATE TABLE t2 (c INT); # Engine choice does not matter INSERT t2 SELECT SEQ FROM seq_1_to_200000; XA START 'x'; DELETE FROM t2; INSERT INTO t1 VALUES(1); XA END 'x'; XA ROLLBACK 'x'; SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back {code} Leads to: {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Seconds_Behind_Master: 1213 ... Slave_SQL_Running_State: Waiting for room in worker thread event queue {noformat} And {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance. {noformat} The {{DELETE}} application continues for 20+ minutes: (after which I stopped it) {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 11.4.0-opt>SHOW PROCESSLIST; +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 | | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 | | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 | | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 | | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ 7 rows in set (0.000 sec) {noformat} Findings: * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?) * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT). * Why does the XA rollback need to fail on the master (leaving 200k records in t2)? The warning is also not correct as the same warning happens when t2 is InnoDB. * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related). |
Description |
{code:sql}
# Use RBR, standard m/s setup CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria CREATE TABLE t2 (c INT); # Engine choice does not matter INSERT t2 SELECT SEQ FROM seq_1_to_200000; XA START 'x'; DELETE FROM t2; INSERT INTO t1 VALUES(1); XA END 'x'; XA ROLLBACK 'x'; SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back {code} Leads to: {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Seconds_Behind_Master: 1213 ... Slave_SQL_Running_State: Waiting for room in worker thread event queue {noformat} And {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance. {noformat} The {{DELETE}} application continues for 20+ minutes: (after which I stopped it) {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 11.4.0-opt>SHOW PROCESSLIST; +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 | | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 | | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 | | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 | | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ 7 rows in set (0.000 sec) {noformat} Findings: * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?) * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT). * Why does the XA rollback need to fail on the master (leaving 200k records in t2)? The warning is also not correct as the same warning happens when t2 is InnoDB. * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related). |
{code:sql}
# Use RBR, standard m/s setup CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria CREATE TABLE t2 (c INT); # Engine choice does not matter INSERT t2 SELECT SEQ FROM seq_1_to_200000; XA START 'x'; DELETE FROM t2; INSERT INTO t1 VALUES(1); XA END 'x'; XA ROLLBACK 'x'; SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back {code} Leads to: {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Seconds_Behind_Master: 1213 ... Slave_SQL_Running_State: Waiting for room in worker thread event queue {noformat} And {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance. {noformat} The {{DELETE}} application continues for 20+ minutes: (after which I stopped it) {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 11.4.0-opt>SHOW PROCESSLIST; +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 | | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 | | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 | | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 | | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ 7 rows in set (0.000 sec) {noformat} Findings and questions: * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?) * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT). * Why does the XA rollback need to fail on the master (leaving 200k records in t2)? The warning is also not correct as the same warning happens when t2 is InnoDB. * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related). |
Labels | performance replication | RBR performance replication |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 11.0 [ 28320 ] | |
Fix Version/s | 11.1 [ 28549 ] | |
Fix Version/s | 11.2 [ 28603 ] | |
Fix Version/s | 11.3 [ 28565 ] | |
Affects Version/s | 10.4 [ 22408 ] | |
Affects Version/s | 10.5 [ 23123 ] | |
Affects Version/s | 10.6 [ 24028 ] | |
Affects Version/s | 10.11 [ 27614 ] | |
Affects Version/s | 11.0 [ 28320 ] | |
Affects Version/s | 11.1 [ 28549 ] | |
Affects Version/s | 11.2 [ 28603 ] | |
Affects Version/s | 11.3 [ 28565 ] | |
Affects Version/s | 11.5 [ 29506 ] |
Description |
{code:sql}
# Use RBR, standard m/s setup CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria CREATE TABLE t2 (c INT); # Engine choice does not matter INSERT t2 SELECT SEQ FROM seq_1_to_200000; XA START 'x'; DELETE FROM t2; INSERT INTO t1 VALUES(1); XA END 'x'; XA ROLLBACK 'x'; SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back {code} Leads to: {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Seconds_Behind_Master: 1213 ... Slave_SQL_Running_State: Waiting for room in worker thread event queue {noformat} And {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance. {noformat} The {{DELETE}} application continues for 20+ minutes: (after which I stopped it) {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 11.4.0-opt>SHOW PROCESSLIST; +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 | | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 | | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 | | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 | | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ 7 rows in set (0.000 sec) {noformat} Findings and questions: * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?) * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT). * Why does the XA rollback need to fail on the master (leaving 200k records in t2)? The warning is also not correct as the same warning happens when t2 is InnoDB. * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related). |
{code:sql}
# Use RBR, standard m/s setup CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria CREATE TABLE t2 (c INT); # Engine choice does not matter INSERT t2 SELECT SEQ FROM seq_1_to_200000; XA START 'x'; DELETE FROM t2; INSERT INTO t1 VALUES(1); XA END 'x'; XA ROLLBACK 'x'; SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back {code} Leads to: {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Seconds_Behind_Master: 1213 ... Slave_SQL_Running_State: Waiting for room in worker thread event queue {noformat} And {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance. {noformat} The {{DELETE}} application continues for 20+ minutes: (after which I stopped it) {noformat:title=11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)} 11.4.0-opt>SHOW PROCESSLIST; +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ | 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 | | 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 | | 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 | | 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 | | 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 | | 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+ 7 rows in set (0.000 sec) {noformat} Findings and questions: * The {{Delete_rows_log_event::find_row(-1)}} time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?) * While adding a PRIMARY KEY indeed helps with the {{DELETE}} for t2 (no wait), the issue does not happen when the {{INSERT INTO t1 VALUES(1);}} is not present *even without adding a key* - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT). * Why does the XA rollback need to fail on the master (leaving 200k records in t2, even when it is InnoDB)? * There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the {{DELETE}} could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related). |
Summary | Replica severely lags on XA rollback when deleting rows from a non-transactional table while another table receives an INSERT inside the same XA transaction | Replica severely lags on XA ROLLBACK when deleting rows from a non-transactional table while another table receives an INSERT inside the same XA transaction |
Link | This issue relates to TODO-4062 [ TODO-4062 ] |
Fix Version/s | 11.0 [ 28320 ] | |
Fix Version/s | 11.3 [ 28565 ] |
Fix Version/s | 10.4 [ 22408 ] |
Fix Version/s | 11.1 [ 28549 ] |
Fix Version/s | 11.2(EOL) [ 28603 ] |
MTR Testcase
-- source include/master-slave.inc
-- source include/have_innodb.inc
-- source include/have_sequence.inc
--sync_slave_with_master
It will show [pass] but then semi-hang, and terminate on the --sync_slave_with_master after 300 seconds:
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)
CURRENT_TEST: main.test
analyze: sync_with_master
mysqltest: At line 13: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 2009383, 300, '')' returned -1 indicating timeout after 300 seconds
CLI to MTR replication slave before MTR terminates:
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)
$ /test/MD060224-mariadb-11.4.0-linux-x86_64-opt/bin/mariadb -uroot -P16001
WARNING: option --ssl-verify-server-cert is disabled, because of an insecure passwordless login.
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 11.4.0-MariaDB-log MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> SHOW PROCESSLIST;
+----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
| 5 | root | localhost:46154 | test | Sleep | 260 | | NULL | 0.000 |
| 6 | system user | | NULL | Slave_IO | 260 | Waiting for master to send event | NULL | 0.000 |
| 7 | system user | | test | Slave_SQL | 17 | Delete_rows_log_event::find_row(-1) on table `t2` | DELETE FROM t2 | 0.000 |
| 8 | root | localhost:46166 | test | Query | 256 | Waiting for the slave SQL thread to advance position | select master_pos_wait('master-bin.000001', 2009383, 300, '') | 0.000 |
| 9 | root | localhost:46174 | test | Sleep | 260 | | NULL | 0.000 |
| 11 | root | localhost:21434 | NULL | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
+----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
6 rows in set (0.000 sec)