Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.4(EOL), 10.5, 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2, 11.3(EOL), 11.4, 11.5(EOL)
Description
# Use RBR, standard m/s setup |
CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria |
CREATE TABLE t2 (c INT); # Engine choice does not matter |
INSERT t2 SELECT SEQ FROM seq_1_to_200000; |
XA START 'x'; |
DELETE FROM t2; |
INSERT INTO t1 VALUES(1); |
XA END 'x'; |
XA ROLLBACK 'x'; |
SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back |
Leads to:
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
...
|
Seconds_Behind_Master: 1213
|
...
|
Slave_SQL_Running_State: Waiting for room in worker thread event queue
|
And
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
|
The DELETE application continues for 20+ minutes: (after which I stopped it)
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
11.4.0-opt>SHOW PROCESSLIST;
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
| 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
|
| 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
|
| 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
|
| 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
7 rows in set (0.000 sec)
|
Findings and questions:
- The Delete_rows_log_event::find_row(-1) time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
- While adding a PRIMARY KEY indeed helps with the DELETE for t2 (no wait), the issue does not happen when the INSERT INTO t1 VALUES(1); is not present even without adding a key - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
- Why does the XA rollback need to fail on the master (leaving 200k records in t2, even when it is InnoDB)?
- There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the DELETE could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).