Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.4(EOL), 10.5, 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4, 11.5(EOL)
Description
# Use RBR, standard m/s setup |
CREATE TABLE t1 (c INT) ENGINE=MyISAM; # Or MEMORY,Aria |
CREATE TABLE t2 (c INT); # Engine choice does not matter |
INSERT t2 SELECT SEQ FROM seq_1_to_200000; |
XA START 'x'; |
DELETE FROM t2; |
INSERT INTO t1 VALUES(1); |
XA END 'x'; |
XA ROLLBACK 'x'; |
SHOW WARNINGS; # Warning: 1196: Some non-transactional changed tables couldn't be rolled back |
Leads to:
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
...
|
Seconds_Behind_Master: 1213
|
...
|
Slave_SQL_Running_State: Waiting for room in worker thread event queue
|
And
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
2024-03-02 8:30:07 8 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (61 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
|
The DELETE application continues for 20+ minutes: (after which I stopped it)
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized) |
11.4.0-opt>SHOW PROCESSLIST;
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
| 6 | system user | | NULL | Slave_IO | 1219 | Waiting for master to send event | NULL | 0.000 |
|
| 9 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 8 | system user | | NULL | Slave_worker | 1219 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 10 | system user | | NULL | Slave_worker | 1215 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 11 | system user | | test | Slave_worker | 21 | Delete_rows_log_event::find_row(-1) on table `t2` | NULL | 0.000 |
|
| 7 | system user | | NULL | Slave_SQL | 1219 | Slave has read all relay log; waiting for more updates | NULL | 0.000 |
|
| 12 | root | localhost | test | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
|
+----+-------------+-----------+------+--------------+------+--------------------------------------------------------+------------------+----------+
|
7 rows in set (0.000 sec)
|
Findings and questions:
- The Delete_rows_log_event::find_row(-1) time cycles over and over from 0 to 8 seconds at the start. Later on the rounds become longer and we see 0 to 24 seconds for example. (what is happening?)
- While adding a PRIMARY KEY indeed helps with the DELETE for t2 (no wait), the issue does not happen when the INSERT INTO t1 VALUES(1); is not present even without adding a key - which in a way disqualifies the error message (i.e. the issue can be resolved without adding a key, or again - another issue is at play here, and it is related to the INSERT).
- Why does the XA rollback need to fail on the master (leaving 200k records in t2, even when it is InnoDB)?
- There is a performance issue in that the slave "work" (and a 20 minute+ slave lockup for a single relatively small transaction) is unnecessary and can be avoided, especially given that the DELETE could not be actioned on the primary and/or can be avoided when the INSERT is not present (and the tables are not related).
MTR Testcase
-- source include/master-slave.inc
-- source include/have_innodb.inc
-- source include/have_sequence.inc
--sync_slave_with_master
It will show [pass] but then semi-hang, and terminate on the --sync_slave_with_master after 300 seconds:
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)
CURRENT_TEST: main.test
analyze: sync_with_master
mysqltest: At line 13: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 2009383, 300, '')' returned -1 indicating timeout after 300 seconds
CLI to MTR replication slave before MTR terminates:
11.4.0 9b1ea6904965dd345478dedd80e181ad54c767da (Optimized)
$ /test/MD060224-mariadb-11.4.0-linux-x86_64-opt/bin/mariadb -uroot -P16001
WARNING: option --ssl-verify-server-cert is disabled, because of an insecure passwordless login.
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 11.4.0-MariaDB-log MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> SHOW PROCESSLIST;
+----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
| 5 | root | localhost:46154 | test | Sleep | 260 | | NULL | 0.000 |
| 6 | system user | | NULL | Slave_IO | 260 | Waiting for master to send event | NULL | 0.000 |
| 7 | system user | | test | Slave_SQL | 17 | Delete_rows_log_event::find_row(-1) on table `t2` | DELETE FROM t2 | 0.000 |
| 8 | root | localhost:46166 | test | Query | 256 | Waiting for the slave SQL thread to advance position | select master_pos_wait('master-bin.000001', 2009383, 300, '') | 0.000 |
| 9 | root | localhost:46174 | test | Sleep | 260 | | NULL | 0.000 |
| 11 | root | localhost:21434 | NULL | Query | 0 | starting | SHOW PROCESSLIST | 0.000 |
+----+-------------+-----------------+------+-----------+------+------------------------------------------------------+---------------------------------------------------------------+----------+
6 rows in set (0.000 sec)