[MDEV-7808] FLUSH TABLES WITH READ LOCK has strange locking behavior - Jira

Geoff Montee (Inactive) created issue - 2015-03-20 17:56

Elena Stepanova added a comment - 2015-03-20 18:07

I remember that galera used to not work with explicit table locks, not sure if it's still the case. Assigning to nirbhay_c.

Elena Stepanova added a comment - 2015-03-20 18:07 I remember that galera used to not work with explicit table locks, not sure if it's still the case. Assigning to nirbhay_c .

Elena Stepanova made changes - 2015-03-20 18:07

Field	Original Value	New Value
Assignee		Nirbhay Choubey [ nirbhay_c ]

Geoff Montee (Inactive) made changes - 2015-03-20 19:46

Description

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
{code}

Now let's open up two MySQL shells on the same node.

On the first shell:

{code}
MariaDB [tmp]> CALL insert_test_data();
{code}

Immediately after, do this on the second shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Sleep | 161 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 161 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Query | 44 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
    VALUES ( NAME_CONST('i',486), NAME_CONST('i',486)) | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
4 rows in set (0.00 sec)
{code}

The global read lock appears to still be held and is preventing new queries from running.

The log on this node shows that the provider has been paused:

{code}
150320 11:41:42 [Note] WSREP: TO END: 2410, 2 : CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
)
150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2410
150320 11:41:42 [Note] WSREP: TO END: 2410, update seqno
150320 11:41:42 [Note] WSREP: TO END: 2410
150320 11:41:42 [Note] WSREP: TO BEGIN: -1, 0 : CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END
150320 11:41:42 [Note] WSREP: TO BEGIN: 2411, 2
150320 11:41:42 [Note] WSREP: TO END: 2411, 2 : CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END
150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2411
150320 11:41:42 [Note] WSREP: TO END: 2411, update seqno
150320 11:41:42 [Note] WSREP: TO END: 2411
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

Our original query is hanging, so let's kill it and try dropping the table to try again:

{code}
MariaDB [tmp]> CALL insert_test_data();
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
MariaDB [tmp]> DROP TABLE tmp;
{code}

This also hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 468 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 468 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Query | 22 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 97 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

When attempting to kill the query, the connection is lost:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The table appears to be already dropped on other nodes, but still exists on the current node.

Original node:

{code}
MariaDB [tmp]> show tables;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Other node:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
{code}

Now let's open up two MySQL shells on the same node.

On the first shell:

{code}
MariaDB [tmp]> CALL insert_test_data();
{code}

Immediately after, do this on the second shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Sleep | 161 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 161 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Query | 44 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
    VALUES ( NAME_CONST('i',486), NAME_CONST('i',486)) | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
4 rows in set (0.00 sec)
{code}

The global read lock appears to still be held and is preventing new queries from running.

The log on this node shows that the provider has been paused:

{code}
150320 11:41:42 [Note] WSREP: TO END: 2410, 2 : CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
)
150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2410
150320 11:41:42 [Note] WSREP: TO END: 2410, update seqno
150320 11:41:42 [Note] WSREP: TO END: 2410
150320 11:41:42 [Note] WSREP: TO BEGIN: -1, 0 : CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END
150320 11:41:42 [Note] WSREP: TO BEGIN: 2411, 2
150320 11:41:42 [Note] WSREP: TO END: 2411, 2 : CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END
150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2411
150320 11:41:42 [Note] WSREP: TO END: 2411, update seqno
150320 11:41:42 [Note] WSREP: TO END: 2411
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

Our original query is hanging, so let's kill it and try dropping the table to try again:

{code}
MariaDB [tmp]> CALL insert_test_data();
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
MariaDB [tmp]> DROP TABLE tmp;
{code}

This also hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 468 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 468 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Query | 22 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 97 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

When attempting to kill the query, the connection is lost:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The table appears to be already dropped on other nodes, but still exists on the current node.

Original node:

{code}
MariaDB [tmp]> show tables;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Other node:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the table disappears.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node can't run the DDL due to locks.

Geoff Montee (Inactive) made changes - 2015-03-20 19:59

Description

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
{code}

Now let's open up two MySQL shells on the same node.

On the first shell:

{code}
MariaDB [tmp]> CALL insert_test_data();
{code}

Immediately after, do this on the second shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Sleep | 161 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 161 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Query | 44 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
    VALUES ( NAME_CONST('i',486), NAME_CONST('i',486)) | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
+----+-------------+-----------+------+---------+------+------------------------------+--------------------------------------------------------------------------------------+----------+
4 rows in set (0.00 sec)
{code}

The global read lock appears to still be held and is preventing new queries from running.

The log on this node shows that the provider has been paused:

{code}
150320 11:41:42 [Note] WSREP: TO END: 2410, 2 : CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
)
150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2410
150320 11:41:42 [Note] WSREP: TO END: 2410, update seqno
150320 11:41:42 [Note] WSREP: TO END: 2410
150320 11:41:42 [Note] WSREP: TO BEGIN: -1, 0 : CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END
150320 11:41:42 [Note] WSREP: TO BEGIN: 2411, 2
150320 11:41:42 [Note] WSREP: TO END: 2411, 2 : CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END
150320 11:41:42 [Note] WSREP: Set WSREPXid for InnoDB: 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2411
150320 11:41:42 [Note] WSREP: TO END: 2411, update seqno
150320 11:41:42 [Note] WSREP: TO END: 2411
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

Our original query is hanging, so let's kill it and try dropping the table to try again:

{code}
MariaDB [tmp]> CALL insert_test_data();
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
MariaDB [tmp]> DROP TABLE tmp;
{code}

This also hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 468 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 468 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Query | 22 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 97 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

When attempting to kill the query, the connection is lost:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The table appears to be already dropped on other nodes, but still exists on the current node.

Original node:

{code}
MariaDB [tmp]> show tables;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Other node:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the table disappears.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node can't run the DDL due to locks.

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The table appears to be already dropped on other nodes, but still exists on the current node.

Original node:

{code}
MariaDB [tmp]> show tables;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Other node:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the table disappears.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node can't run the DDL due to locks.

Geoff Montee (Inactive) made changes - 2015-03-20 20:02

Description

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The table appears to be already dropped on other nodes, but still exists on the current node.

Original node:

{code}
MariaDB [tmp]> show tables;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Other node:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the table disappears.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node can't run the DDL due to locks.

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the query is killed.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it.

Geoff Montee (Inactive) made changes - 2015-03-20 20:10

Description

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the query is killed.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it.

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the query is killed.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it.

Let's add some DML to the mix.

Some setup:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
{code}

Now let's have three shells on the originating node:

On the first shell, execute:

{code}
CALL insert_test_data();
{code}

On the second shell, execute:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)
{code}

On the third shell, try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

Process list looks as expected:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
    VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 |
| 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
5 rows in set (0.00 sec)
{code}

On other nodes, the table is gone:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

On the originating node, the table still exists:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Let's cancel the DML:

{code}
MariaDB [tmp]> CALL insert_test_data();
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
{code}

The table still exists on the originating node and the query is dead:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 |
| 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

It goes away when we UNLOCK TABLES:

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Geoff Montee (Inactive) made changes - 2015-03-20 20:13

Summary

FLUSH TABLES WITH READ LOCK doesn't release global lock

FLUSH TABLES WITH READ LOCK has strange locking behavior

Geoff Montee (Inactive) made changes - 2015-03-20 20:15

Description

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the query is killed.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it.

Let's add some DML to the mix.

Some setup:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
{code}

Now let's have three shells on the originating node:

On the first shell, execute:

{code}
CALL insert_test_data();
{code}

On the second shell, execute:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)
{code}

On the third shell, try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

Process list looks as expected:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
    VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 |
| 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
5 rows in set (0.00 sec)
{code}

On other nodes, the table is gone:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

On the originating node, the table still exists:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Let's cancel the DML:

{code}
MariaDB [tmp]> CALL insert_test_data();
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
{code}

The table still exists on the originating node and the query is dead:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 |
| 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

It goes away when we UNLOCK TABLES:

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the query is killed.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it.

Let's add some DML to the mix.

Some setup:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
{code}

Now let's have three shells on the originating node:

On the first shell, execute:

{code}
CALL insert_test_data();
{code}

On the second shell, execute:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)
{code}

On the third shell, try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

Process list looks as expected:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
    VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 |
| 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
5 rows in set (0.00 sec)
{code}

On other nodes, the table is gone:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

On the originating node, the table still exists:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Let's cancel the DML:

{code}
MariaDB [tmp]> CALL insert_test_data();
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
{code}

The table still exists on the originating node and the query is dead:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 |
| 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

It goes away when we UNLOCK TABLES:

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Again, the other nodes have an error about the missing table, suggesting it may have been executed twice:

{code}
150320 14:04:56 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5070
150320 14:04:56 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5113, g: 5070, s: 5069, d: 5069, ts: 185308382405439)
{code}

Geoff Montee (Inactive) made changes - 2015-03-20 20:32

Description

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the query is killed.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL due to locks. The DDL runs on the originating node anyway, unless there's some DML running on that node to block it.

Let's add some DML to the mix.

Some setup:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
{code}

Now let's have three shells on the originating node:

On the first shell, execute:

{code}
CALL insert_test_data();
{code}

On the second shell, execute:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)
{code}

On the third shell, try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

Process list looks as expected:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
    VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 |
| 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
5 rows in set (0.00 sec)
{code}

On other nodes, the table is gone:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

On the originating node, the table still exists:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Let's cancel the DML:

{code}
MariaDB [tmp]> CALL insert_test_data();
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
{code}

The table still exists on the originating node and the query is dead:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 |
| 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

It goes away when we UNLOCK TABLES:

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Again, the other nodes have an error about the missing table, suggesting it may have been executed twice:

{code}
150320 14:04:56 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5070
150320 14:04:56 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5113, g: 5070, s: 5069, d: 5069, ts: 185308382405439)
{code}

It looks like if wsrep_OSU_method set to TOI, DDL is transmitted to other nodes even when the provider is paused, and the DDL on the originating node shouldn't run the DDL because it is supposedly waiting on a lock due to FLUSH TABLES WITH READ LOCK. The DDL runs on the originating node anyway. The DDL runs on all nodes except the originating node, if there's some DML running on that node to block it.

According to the logs, it also looks like DDL may be getting sent to the other nodes more than once.

h4. DDL + no DML

I have 3 Galera nodes in a cluster.

On all 3 nodes, let's execute:

{code}
SET GLOBAL wsrep_debug=ON;
{code}

First, let's do some setup on one of the nodes:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
{code}

Now let's open up two MySQL shells on the same node.

Execute this on the first shell:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.14 sec)
{code}

The log on this node shows that the provider has been paused:

{code}
150320 11:43:04 [Note] WSREP: Provider paused at 5edc131e-ae0b-11e4-85d1-c2c0512e9e7e:2896 (495)
{code}

In the second shell, let's try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

This hangs:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1744 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1744 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 14 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

Let's attempt to kill the query once:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
{code}

The query is now stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 1815 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 1815 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Killed | 85 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

The table is already missing from all nodes:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Logs on the other nodes suggest that the DROP TABLE statement may have been sent to other nodes twice, since we get an error saying the table doesn't exist:

{code}
150320 13:51:39 [ERROR] Slave SQL: Error 'Unknown table 'tmp'' on query. Default database: 'tmp'. Query: 'DROP TABLE tmp', Error_code: 1051
150320 13:51:39 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5068
150320 13:51:39 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5111, g: 5068, s: 5067, d: 5067, ts: 184510656454187)
{code}

If we try to cancel the query again, we lose connection to the server:

{code}
MariaDB [tmp]> DROP TABLE tmp;
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
{code}

However, the node is still up, and the query is stuck in the "killed" state:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 620 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 620 | wsrep aborter idle | NULL | 0.000 |
| 4 | root | localhost | tmp | Killed | 174 | checking permissions | DROP TABLE tmp | 0.000 |
| 5 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 6 | root | localhost | tmp | Sleep | 249 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

The node is still in the cluster:

{code}
MariaDB [tmp]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
1 row in set (0.00 sec)
{code}

As soon as we UNLOCK TABLES on the original node, the query is killed.

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2305 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2305 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 1 | | NULL | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------+------------------+----------+
4 rows in set (0.00 sec)
{code}

h4. DDL + DML

Let's add some DML to the mix.

Some setup:

{code}
CREATE TABLE `tmp` (
  `a` int(11) NOT NULL,
  `b` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`a`)
);
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 1000000 DO
    INSERT INTO `tmp` (`a`, `b`)
    VALUES (i, i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
{code}

Now let's have three shells on the originating node:

On the first shell, execute:

{code}
CALL insert_test_data();
{code}

On the second shell, execute:

{code}
MariaDB [tmp]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)
{code}

On the third shell, try to drop the table:

{code}
MariaDB [tmp]> DROP TABLE tmp;
{code}

Process list looks as expected:

{code}
MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Sleep | 2648 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2648 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Query | 59 | Waiting for global read lock | INSERT INTO `tmp` (`a`, `b`)
    VALUES ( NAME_CONST('i',6601), NAME_CONST('i',6601)) | 0.000 |
| 20 | root | localhost | tmp | Query | 20 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+------------------------------+----------------------------------------------------------------------------------------+----------+
5 rows in set (0.00 sec)
{code}

On other nodes, the table is gone:

{code}
MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

On the originating node, the table still exists:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)
{code}

Let's cancel the DML:

{code}
MariaDB [tmp]> CALL insert_test_data();
^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
{code}

The table still exists on the originating node and the query is dead:

{code}
MariaDB [tmp]> SHOW TABLES;
+---------------+
| Tables_in_tmp |
+---------------+
| tmp |
+---------------+
1 row in set (0.00 sec)

MariaDB [tmp]> SHOW PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 2768 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 2768 | wsrep aborter idle | NULL | 0.000 |
| 8 | root | localhost | tmp | Query | 0 | sleeping | SHOW PROCESSLIST | 0.000 |
| 18 | root | localhost | tmp | Sleep | 13 | | NULL | 0.000 |
| 20 | root | localhost | tmp | Query | 140 | checking permissions | DROP TABLE tmp | 0.000 |
+----+-------------+-----------+------+---------+------+----------------------+------------------+----------+
5 rows in set (0.00 sec)
{code}

It goes away when we UNLOCK TABLES:

{code}
MariaDB [tmp]> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [tmp]> SHOW TABLES;
Empty set (0.00 sec)
{code}

Again, the other nodes have an error about the missing table, suggesting it may have been executed twice:

{code}
150320 14:04:56 [Warning] WSREP: RBR event 1 Query apply warning: 1, 5070
150320 14:04:56 [Warning] WSREP: Ignoring error for TO isolated action: source: bafbdc15-cf25-11e4-9ace-2fc1846ad321 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 18 trx_id: -1 seqnos (l: 5113, g: 5070, s: 5069, d: 5069, ts: 185308382405439)
{code}

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:50

Workflow

MariaDB v2 [ 60168 ]

MariaDB v3 [ 62530 ]

Sergei Golubchik made changes - 2017-02-25 21:34

Assignee

Nirbhay Choubey [ nirbhay_c ]

Sachin Setiya (Inactive) made changes - 2017-04-18 05:17

Assignee

Sachin Setiya [ sachin.setiya.007 ]

Elena Stepanova made changes - 2017-11-05 17:05

Fix Version/s		5.5-galera [ 21900 ]
Fix Version/s		10.0-galera [ 21901 ]

Sergei Golubchik made changes - 2018-05-02 10:59

Assignee

Sachin Setiya [ sachin.setiya.007 ]

Jan Lindström [ jplindst ]

Ralf Gebhardt made changes - 2020-06-02 06:41

Fix Version/s	5.5-galera [ 21900 ]
Fix Version/s	10.0-galera [ 21901 ]
Assignee	Jan Lindström [ jplindst ]	Stepan Patryshev [ stepan.patryshev ]

Elena Stepanova made changes - 2020-06-30 12:31

Labels

galera

galera need_verification

Elena Stepanova made changes - 2020-07-30 12:31

Fix Version/s

N/A [ 14700 ]

Stepan Patryshev (Inactive) made changes - 2021-02-19 15:08

Assignee

Stepan Patryshev [ stepan.patryshev ]

Ramesh Sivaraman [ JIRAUSER48189 ]

Sergei Golubchik made changes - 2021-12-06 21:32

Workflow

MariaDB v3 [ 62530 ]

MariaDB v4 [ 139790 ]

Jan Lindström added a comment - 2023-06-06 09:16

Both versions are EOL.

Jan Lindström added a comment - 2023-06-06 09:16 Both versions are EOL.

Jan Lindström made changes - 2023-06-06 09:16

Resolution		Won't Fix [ 2 ]
Status	Open [ 1 ]	Closed [ 6 ]

MariaDB Server

FLUSH TABLES WITH READ LOCK has strange locking behavior

Details

Description

DDL + no DML

DDL + DML

Attachments

Activity

People

Dates

Git Integration