[MDEV-27654] Galera cluster master-master async replication goes into hang mode. Created: 2022-01-28  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.7, 10.8
Fix Version/s: 10.11

Type: Bug Priority: Major
Reporter: Ramesh Sivaraman Assignee: Ramesh Sivaraman
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Galera cluster master-master async replication goes into hang mode when executing following test case.

– galera master 1
CREATE TABLE t1 (f1 INTEGER PRIMARY KEY AUTO_INCREMENT, f2 INTEGER);
insert into t1(f1,f2) select seq,seq from seq_1_to_10000000;

– galera master 2
ALTER TABLE t1 DROP COLUMN f2;

– galera master 1 (run this before finishing ALTER on master 2)
delete from t1 limit 10;

master 2 goes into hang mode, DDL/DMLs are not proceeding further on master 2

MariaDB [(none)]> show processlist;
+----+-------------+-----------------+------+-------------+------+---------------------------------------------------------------+-------------------------------+----------+
| Id | User        | Host            | db   | Command     | Time | State                                                         | Info                          | Progress |
+----+-------------+-----------------+------+-------------+------+---------------------------------------------------------------+-------------------------------+----------+
|  1 | system user |                 | NULL | Sleep       |  302 | wsrep aborter idle                                            | NULL                          |    0.000 |
|  2 | system user |                 | NULL | Sleep       |  302 | closing tables                                                | NULL                          |    0.000 |
|  7 | system user |                 | NULL | Sleep       |  302 | wsrep applier idle                                            | NULL                          |    0.000 |
| 10 | root        | localhost       | test | Query       |  125 | Waiting for TOI DDL                                           | ALTER TABLE t1 DROP COLUMN f2 |    0.000 |
| 11 | system user |                 | NULL | Slave_IO    |  264 | Waiting for master to send event                              | NULL                          |    0.000 |
| 13 | repl        | localhost:56246 | NULL | Binlog Dump |  255 | Master has sent all binlog to slave; waiting for more updates | NULL                          |    0.000 |
| 14 | root        | localhost       | test | Field List  |    9 | Waiting for table metadata lock                               | NULL                          |    0.000 |
| 15 | root        | localhost       | NULL | Query       |    0 | starting                                                      | show processlist              |    0.000 |
+----+-------------+-----------------+------+-------------+------+---------------------------------------------------------------+-------------------------------+----------+
8 rows in set (0.000 sec)
 
MariaDB [(none)]> 



 Comments   
Comment by Andrei Elkin [ 2022-03-23 ]

ramesh must have directed this one to me 'cos of MDEV-11675's ALTER bug, but it can't relate to that 'cos of 10.7. The state of a being stuck DDL thread is from wsrep domain, so re-assigning.

Generated at Thu Feb 08 09:54:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.