Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Not a Bug
-
None
-
None
-
MariaDB 10.1.18 / Galera 25.3.17
Description
With the following sequence of commands, node2 always gets stuck and all writes to this node hang.
Node1:
mariadb[node1]> CREATE TABLE t1 (id INT PRIMARY KEY);
|
Query OK, 0 rows affected (0.02 sec) |
|
mariadb[node1]> LOCK TABLES t1 WRITE;
|
Query OK, 0 rows affected (0.00 sec) |
|
Node2:
mariadb[node2]> LOCK TABLE t1 WRITE;
|
Query OK, 0 rows affected (0.00 sec) |
|
mariadb[node2]> SELECT * FROM t1;
|
Empty set (0.00 sec) |
Node1:
mariadb[node1]> INSERT INTO t1 VALUES (1); |
Query OK, 1 row affected (0.01 sec) |
|
Node2:
mariadb[node2]> INSERT INTO t1 VALUES (2); |
-- ^ never returns
|
|
Node1:
mariadb[node1]> UNLOCK TABLES;
|
Query OK, 0 rows affected (0.00 sec) |
|
mariadb[node1]> SELECT * FROM t1;
|
+----+
|
| id |
|
+----+
|
| 1 | |
| 2 | |
+----+
|
2 rows in set (0.00 sec) |
At this point, any writes on node2 will hang and even after UNLOCK TABLES on node1, the INSERT on node2 remains in a hung state. The connection on node2 holding the table lock cannot be terminated through KILL commands. I've attached gdb "thread apply all bt" output, in case it is useful.
I do see in my error log on node2 that an abort was attempted:
[Note] WSREP: MDL conflict db=foo table=t1 ticket=7 solved by abort |
That behavior I do expect, but it did not seem to successfully unstick this particular case. Also worth mentioning is that if I continue to write to the other cluster nodes, wsrep_local_recv_queue rises on node2 - I was expecting flow control to kick in at some point given that this node state is reported as "Synced", and is using defaults (i.e. gcs.fc_limit=16).
Also reproduced under MariaDB 10.1.22 (and tested w/ latest galera 25.3.20) , but the attached logs are from an older MariaDB 10.1.18 (galera 25.3.17) environment.