Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
5.5.42-galera, 10.0.17-galera
Description
It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL because it is supposedly waiting on a lock due to FLUSH TABLES WITH READ LOCK. The DDL runs on the originating node anyway. The DDL runs on all nodes except the originating node, if there's some DML running on that node to block it.
According to the logs, it also looks like DDL may be getting sent to the other nodes more than once.
DDL + no DML
I have 3 Galera nodes in a cluster.
On all 3 nodes, let's execute:
SET GLOBAL wsrep_debug=ON;
|
First, let's do some setup on one of the nodes:
CREATE TABLE `tmp` (
|
`a` int(11) NOT NULL,
|
`b` varchar(20) DEFAULT NULL,
|
PRIMARY KEY (`a`)
|
);
|
Now let's open up two MySQL shells on the same node.
Execute this on the first shell:
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
|
Query OK, 0 rows affected (0.14 sec)
|
The log on this node shows that the provider has been paused:
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
|
In the second shell, let's try to drop the table:
MariaDB [tmp]> DROP TABLE tmp;
|
This hangs:
MariaDB [tmp]> SHOW PROCESSLIST;
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
|
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
|
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
|
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
4 rows in set (0.00 sec)
|
Let's attempt to kill the query once:
MariaDB [tmp]> DROP TABLE tmp;
|
^CCtrl-C -- query killed. Continuing normally.
|
The query is now stuck in the "killed" state:
MariaDB [tmp]> SHOW PROCESSLIST;
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
|
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
|
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
|
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
4 rows in set (0.00 sec)
|
The table is already missing from all nodes:
MariaDB [tmp]> SHOW TABLES;
|
Empty set (0.00 sec)
|
Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
|
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
|
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
|
If we try to cancel the query again, we lose connection to the server:
MariaDB [tmp]> DROP TABLE tmp;
|
^CCtrl-C -- query killed. Continuing normally.
|
^CCtrl-C -- query killed. Continuing normally.
|
ERROR 2013 (HY000): Lost connection to MySQL server during query
|
However, the node is still up, and the query is stuck in the "killed" state:
MariaDB [tmp]> SHOW PROCESSLIST;
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
|
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
|
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
|
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
|
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
5 rows in set (0.00 sec)
|
The node is still in the cluster:
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
|
+--------------------+-------+
|
| Variable_name | Value |
|
+--------------------+-------+
|
| wsrep_cluster_size | 3 |
|
+--------------------+-------+
|
1 row in set (0.00 sec)
|
As soon as we UNLOCK TABLES on the original node, the query is killed.
MariaDB [tmp]> UNLOCK TABLES;
|
Query OK, 0 rows affected (0.00 sec)
|
|
MariaDB [tmp]> SHOW PROCESSLIST;
|
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
|
| 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 |
|
| 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 |
|
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
|
| 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 |
|
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
|
4 rows in set (0.00 sec)
|
DDL + DML
Let's add some DML to the mix.
Some setup:
CREATE TABLE `tmp` (
|
`a` int(11) NOT NULL,
|
`b` varchar(20) DEFAULT NULL,
|
PRIMARY KEY (`a`)
|
);
|
DELIMITER $$
|
CREATE PROCEDURE insert_test_data()
|
BEGIN
|
DECLARE i INT DEFAULT 1;
|
|
WHILE i < 1000000 DO
|
INSERT INTO `tmp` (`a`, `b`)
|
VALUES (i, i);
|
SET i = i + 1;
|
END WHILE;
|
END$$
|
DELIMITER ;
|
Now let's have three shells on the originating node:
On the first shell, execute:
CALL insert_test_data();
|
On the second shell, execute:
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
|
Query OK, 0 rows affected (0.00 sec)
|
On the third shell, try to drop the table:
MariaDB [tmp]> DROP TABLE tmp;
|
Process list looks as expected:
MariaDB [tmp]> SHOW PROCESSLIST;
|
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
|
| 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 |
|
| 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 |
|
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
|
| 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
|
VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 |
|
| 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 |
|
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
|
5 rows in set (0.00 sec)
|
On other nodes, the table is gone:
MariaDB [tmp]> SHOW TABLES;
|
Empty set (0.00 sec)
|
On the originating node, the table still exists:
MariaDB [tmp]> SHOW TABLES;
|
+---------------+
|
| Tables_in_tmp |
|
+---------------+
|
| tmp |
|
+---------------+
|
1 row in set (0.00 sec)
|
Let's cancel the DML:
MariaDB [tmp]> CALL insert_test_data();
|
^CCtrl-C -- query killed. Continuing normally.
|
ERROR 1317 (70100): Query execution was interrupted
|
The table still exists on the originating node and the query is dead:
MariaDB [tmp]> SHOW TABLES;
|
+---------------+
|
| Tables_in_tmp |
|
+---------------+
|
| tmp |
|
+---------------+
|
1 row in set (0.00 sec)
|
|
MariaDB [tmp]> SHOW PROCESSLIST;
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
| 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 |
|
| 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 |
|
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
|
| 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 |
|
| 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 |
|
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
|
5 rows in set (0.00 sec)
|
It goes away when we UNLOCK TABLES:
MariaDB [tmp]> UNLOCK TABLES;
|
Query OK, 0 rows affected (0.00 sec)
|
|
MariaDB [tmp]> SHOW TABLES;
|
Empty set (0.00 sec)
|
Again, the other nodes have an error about the missing table, suggesting it may have been executed twice:
150320 14:04:56 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5070
|
150320 14:04:56 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5113, g: 5070, s: 5069, d: 5069, ts: 185308382405439)
|
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Assignee | Nirbhay Choubey [ nirbhay_c ] |
Description |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); DELIMITER $$ CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END$$ DELIMITER ; {code} Now let's open up two MySQL shells on the same node. On the first shell: {code} MariaDB [tmp]> CALL insert_test_data(); {code} Immediately after, do this on the second shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ | 1 | system user | | NULL | Sleep | 161 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 161 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Query | 44 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`) VALUES ( NAME_CONST('i',486), NAME_CONST('i',486)) | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ 4 rows in set (0.00 sec) {code} The global read lock appears to still be held and is preventing new queries from running. The log on this node shows that the provider has been paused: {code} 150320 11:41:42 [Note] WSREP: TO END: 2410, 2 : CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ) 150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2410 150320 11:41:42 [Note] WSREP: TO END: 2410, update seqno 150320 11:41:42 [Note] WSREP: TO END: 2410 150320 11:41:42 [Note] WSREP: TO BEGIN: -1, 0 : CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END 150320 11:41:42 [Note] WSREP: TO BEGIN: 2411, 2 150320 11:41:42 [Note] WSREP: TO END: 2411, 2 : CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END 150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2411 150320 11:41:42 [Note] WSREP: TO END: 2411, update seqno 150320 11:41:42 [Note] WSREP: TO END: 2411 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} Our original query is hanging, so let's kill it and try dropping the table to try again: {code} MariaDB [tmp]> CALL insert_test_data(); ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted MariaDB [tmp]> DROP TABLE tmp; {code} This also hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 468 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 468 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Query | 22 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 97 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} When attempting to kill the query, the connection is lost: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The table appears to be already dropped on other nodes, but still exists on the current node. Original node: {code} MariaDB [tmp]> show tables; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Other node: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); DELIMITER $$ CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END$$ DELIMITER ; {code} Now let's open up two MySQL shells on the same node. On the first shell: {code} MariaDB [tmp]> CALL insert_test_data(); {code} Immediately after, do this on the second shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ | 1 | system user | | NULL | Sleep | 161 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 161 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Query | 44 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`) VALUES ( NAME_CONST('i',486), NAME_CONST('i',486)) | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ 4 rows in set (0.00 sec) {code} The global read lock appears to still be held and is preventing new queries from running. The log on this node shows that the provider has been paused: {code} 150320 11:41:42 [Note] WSREP: TO END: 2410, 2 : CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ) 150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2410 150320 11:41:42 [Note] WSREP: TO END: 2410, update seqno 150320 11:41:42 [Note] WSREP: TO END: 2410 150320 11:41:42 [Note] WSREP: TO BEGIN: -1, 0 : CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END 150320 11:41:42 [Note] WSREP: TO BEGIN: 2411, 2 150320 11:41:42 [Note] WSREP: TO END: 2411, 2 : CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END 150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2411 150320 11:41:42 [Note] WSREP: TO END: 2411, update seqno 150320 11:41:42 [Note] WSREP: TO END: 2411 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} Our original query is hanging, so let's kill it and try dropping the table to try again: {code} MariaDB [tmp]> CALL insert_test_data(); ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted MariaDB [tmp]> DROP TABLE tmp; {code} This also hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 468 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 468 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Query | 22 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 97 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} When attempting to kill the query, the connection is lost: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The table appears to be already dropped on other nodes, but still exists on the current node. Original node: {code} MariaDB [tmp]> show tables; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Other node: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the table disappears. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node can't run the DDL due to locks. |
Description |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); DELIMITER $$ CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END$$ DELIMITER ; {code} Now let's open up two MySQL shells on the same node. On the first shell: {code} MariaDB [tmp]> CALL insert_test_data(); {code} Immediately after, do this on the second shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ | 1 | system user | | NULL | Sleep | 161 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 161 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Query | 44 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`) VALUES ( NAME_CONST('i',486), NAME_CONST('i',486)) | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | +----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+ 4 rows in set (0.00 sec) {code} The global read lock appears to still be held and is preventing new queries from running. The log on this node shows that the provider has been paused: {code} 150320 11:41:42 [Note] WSREP: TO END: 2410, 2 : CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ) 150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2410 150320 11:41:42 [Note] WSREP: TO END: 2410, update seqno 150320 11:41:42 [Note] WSREP: TO END: 2410 150320 11:41:42 [Note] WSREP: TO BEGIN: -1, 0 : CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END 150320 11:41:42 [Note] WSREP: TO BEGIN: 2411, 2 150320 11:41:42 [Note] WSREP: TO END: 2411, 2 : CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END 150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2411 150320 11:41:42 [Note] WSREP: TO END: 2411, update seqno 150320 11:41:42 [Note] WSREP: TO END: 2411 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} Our original query is hanging, so let's kill it and try dropping the table to try again: {code} MariaDB [tmp]> CALL insert_test_data(); ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted MariaDB [tmp]> DROP TABLE tmp; {code} This also hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 468 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 468 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Query | 22 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 97 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} When attempting to kill the query, the connection is lost: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The table appears to be already dropped on other nodes, but still exists on the current node. Original node: {code} MariaDB [tmp]> show tables; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Other node: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the table disappears. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node can't run the DDL due to locks. |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The table appears to be already dropped on other nodes, but still exists on the current node. Original node: {code} MariaDB [tmp]> show tables; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Other node: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the table disappears. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node can't run the DDL due to locks. |
Description |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The table appears to be already dropped on other nodes, but still exists on the current node. Original node: {code} MariaDB [tmp]> show tables; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Other node: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the table disappears. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node can't run the DDL due to locks. |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the query is killed. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it. |
Description |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the query is killed. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it. |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the query is killed. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it. Let's add some DML to the mix. Some setup: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); DELIMITER $$ CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END$$ DELIMITER ; {code} Now let's have three shells on the originating node: On the first shell, execute: {code} CALL insert_test_data(); {code} On the second shell, execute: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.00 sec) {code} On the third shell, try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} Process list looks as expected: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`) VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 | | 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ 5 rows in set (0.00 sec) {code} On other nodes, the table is gone: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} On the originating node, the table still exists: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Let's cancel the DML: {code} MariaDB [tmp]> CALL insert_test_data(); ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted {code} The table still exists on the originating node and the query is dead: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 | | 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} It goes away when we UNLOCK TABLES: {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} |
Summary | FLUSH TABLES WITH READ LOCK doesn't release global lock | FLUSH TABLES WITH READ LOCK has strange locking behavior |
Description |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the query is killed. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it. Let's add some DML to the mix. Some setup: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); DELIMITER $$ CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END$$ DELIMITER ; {code} Now let's have three shells on the originating node: On the first shell, execute: {code} CALL insert_test_data(); {code} On the second shell, execute: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.00 sec) {code} On the third shell, try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} Process list looks as expected: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`) VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 | | 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ 5 rows in set (0.00 sec) {code} On other nodes, the table is gone: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} On the originating node, the table still exists: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Let's cancel the DML: {code} MariaDB [tmp]> CALL insert_test_data(); ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted {code} The table still exists on the originating node and the query is dead: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 | | 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} It goes away when we UNLOCK TABLES: {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the query is killed. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it. Let's add some DML to the mix. Some setup: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); DELIMITER $$ CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END$$ DELIMITER ; {code} Now let's have three shells on the originating node: On the first shell, execute: {code} CALL insert_test_data(); {code} On the second shell, execute: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.00 sec) {code} On the third shell, try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} Process list looks as expected: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`) VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 | | 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ 5 rows in set (0.00 sec) {code} On other nodes, the table is gone: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} On the originating node, the table still exists: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Let's cancel the DML: {code} MariaDB [tmp]> CALL insert_test_data(); ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted {code} The table still exists on the originating node and the query is dead: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 | | 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} It goes away when we UNLOCK TABLES: {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Again, the other nodes have an error about the missing table, suggesting it may have been executed twice: {code} 150320 14:04:56 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5070 150320 14:04:56 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5113, g: 5070, s: 5069, d: 5069, ts: 185308382405439) {code} |
Description |
I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the query is killed. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it. Let's add some DML to the mix. Some setup: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); DELIMITER $$ CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END$$ DELIMITER ; {code} Now let's have three shells on the originating node: On the first shell, execute: {code} CALL insert_test_data(); {code} On the second shell, execute: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.00 sec) {code} On the third shell, try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} Process list looks as expected: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`) VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 | | 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ 5 rows in set (0.00 sec) {code} On other nodes, the table is gone: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} On the originating node, the table still exists: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Let's cancel the DML: {code} MariaDB [tmp]> CALL insert_test_data(); ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted {code} The table still exists on the originating node and the query is dead: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 | | 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} It goes away when we UNLOCK TABLES: {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Again, the other nodes have an error about the missing table, suggesting it may have been executed twice: {code} 150320 14:04:56 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5070 150320 14:04:56 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5113, g: 5070, s: 5069, d: 5069, ts: 185308382405439) {code} |
It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL because it is supposedly waiting on a lock due to FLUSH TABLES WITH READ LOCK. The DDL runs on the originating node anyway. The DDL runs on all nodes except the originating node, if there's some DML running on that node to block it. According to the logs, it also looks like DDL may be getting sent to the other nodes more than once. h4. DDL + no DML I have 3 Galera nodes in a cluster. On all 3 nodes, let's execute: {code} SET GLOBAL wsrep_debug=ON; {code} First, let's do some setup on one of the nodes: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); {code} Now let's open up two MySQL shells on the same node. Execute this on the first shell: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.14 sec) {code} The log on this node shows that the provider has been paused: {code} 150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495) {code} In the second shell, let's try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} This hangs: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} Let's attempt to kill the query once: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. {code} The query is now stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} The table is already missing from all nodes: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist: {code} 150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051 150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068 150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187) {code} If we try to cancel the query again, we lose connection to the server: {code} MariaDB [tmp]> DROP TABLE tmp; ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query {code} However, the node is still up, and the query is stuck in the "killed" state: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 | | 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 | | 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} The node is still in the cluster: {code} MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec) {code} As soon as we UNLOCK TABLES on the original node, the query is killed. {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 | +----+-------------+-----------+------+---------+------+--------------------+------------------+----------+ 4 rows in set (0.00 sec) {code} h4. DDL + DML Let's add some DML to the mix. Some setup: {code} CREATE TABLE `tmp` ( `a` int(11) NOT NULL, `b` varchar(20) DEFAULT NULL, PRIMARY KEY (`a`) ); DELIMITER $$ CREATE PROCEDURE insert_test_data() BEGIN DECLARE i INT DEFAULT 1; WHILE i < 1000000 DO INSERT INTO `tmp` (`a`, `b`) VALUES (i, i); SET i = i + 1; END WHILE; END$$ DELIMITER ; {code} Now let's have three shells on the originating node: On the first shell, execute: {code} CALL insert_test_data(); {code} On the second shell, execute: {code} MariaDB [tmp]> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.00 sec) {code} On the third shell, try to drop the table: {code} MariaDB [tmp]> DROP TABLE tmp; {code} Process list looks as expected: {code} MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ | 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`) VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 | | 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+ 5 rows in set (0.00 sec) {code} On other nodes, the table is gone: {code} MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} On the originating node, the table still exists: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) {code} Let's cancel the DML: {code} MariaDB [tmp]> CALL insert_test_data(); ^CCtrl-C -- query killed. Continuing normally. ERROR 1317 (70100): Query execution was interrupted {code} The table still exists on the originating node and the query is dead: {code} MariaDB [tmp]> SHOW TABLES; +---------------+ | Tables_in_tmp | +---------------+ | tmp | +---------------+ 1 row in set (0.00 sec) MariaDB [tmp]> SHOW PROCESSLIST; +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ | 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 | | 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 | | 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 | | 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 | | 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 | +----+-------------+-----------+------+---------+------+----------------------+------------------+----------+ 5 rows in set (0.00 sec) {code} It goes away when we UNLOCK TABLES: {code} MariaDB [tmp]> UNLOCK TABLES; Query OK, 0 rows affected (0.00 sec) MariaDB [tmp]> SHOW TABLES; Empty set (0.00 sec) {code} Again, the other nodes have an error about the missing table, suggesting it may have been executed twice: {code} 150320 14:04:56 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5070 150320 14:04:56 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5113, g: 5070, s: 5069, d: 5069, ts: 185308382405439) {code} |
Workflow | MariaDB v2 [ 60168 ] | MariaDB v3 [ 62530 ] |
Assignee | Nirbhay Choubey [ nirbhay_c ] |
Assignee | Sachin Setiya [ sachin.setiya.007 ] |
Fix Version/s | 5.5-galera [ 21900 ] | |
Fix Version/s | 10.0-galera [ 21901 ] |
Assignee | Sachin Setiya [ sachin.setiya.007 ] | Jan Lindström [ jplindst ] |
Fix Version/s | 5.5-galera [ 21900 ] | |
Fix Version/s | 10.0-galera [ 21901 ] | |
Assignee | Jan Lindström [ jplindst ] | Stepan Patryshev [ stepan.patryshev ] |
Labels | galera | galera need_verification |
Fix Version/s | N/A [ 14700 ] |
Assignee | Stepan Patryshev [ stepan.patryshev ] | Ramesh Sivaraman [ JIRAUSER48189 ] |
Workflow | MariaDB v3 [ 62530 ] | MariaDB v4 [ 139790 ] |
Resolution | Won't Fix [ 2 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
I remember that galera used to not work with explicit table locks, not sure if it's still the case. Assigning to nirbhay_c.