Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.1.11
Description
When a joiner requests an rsync SST, wsrep_sst_rsync on the donor node executes FLUSH TABLES WITH READ LOCK before donating the SST. If FLUSH TABLES WITH READ LOCK is not successful, then this wsrep_sst_rsync process dies not die. Instead, it seems to stick around.
Often, this script seems to have some locks in the database, so this can cause strange problems, such as the node being stuck in the DONOR/DESYNCED state.
To reproduce, let's say that we have a 2-node cluster: one will act as the donor, and one as the joiner.
Let's first create and populate a table:
CREATE DATABASE test_db;
|
USE test_db;
|
|
CREATE TABLE test_table (
|
id int primary key,
|
str varchar(50)
|
);
|
|
DELIMITER $$
|
CREATE PROCEDURE insert_test_data()
|
BEGIN
|
DECLARE i INT DEFAULT 1;
|
|
WHILE i < 100000 DO
|
INSERT INTO `test_table` (id, str)
|
VALUES (i, CONCAT('str', i));
|
SET i = i + 1;
|
END WHILE;
|
END$$
|
DELIMITER ;
|
|
CALL insert_test_data();
|
|
DROP PROCEDURE insert_test_data;
|
Then let's stop one of the nodes and delete the datadir:
sudo systemctl stop mysql
|
sudo rm -fr /var/lib/mysql/*
|
And then on the donor node, let's start some DDL that will take a long time:
CREATE TABLE test_table_copy AS SELECT t1.str AS str1, t2.str AS str2 FROM test_table t1 JOIN test_table t2 ON t1.id != t2.id;
|
Once the DDL is started, let's start the SST on the joiner:
sudo systemctl start mysql
|
The donor will see an error like this:
Feb 19 12:35:35 ip-172-31-22-174 mysqld: 2016-02-19 12:35:35 139654577776384 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '172.31.19.192:4444/rsync_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' '' --gtid '474f3f92-d723-11e5-8da3-d3e87bd5db9a:1' --gtid-domain-id '0''
|
Feb 19 12:35:35 ip-172-31-22-174 mysqld: 2016-02-19 12:35:35 139655842556672 [Note] WSREP: sst_donor_thread signaled with 0
|
Feb 19 12:35:35 ip-172-31-22-174 mysqld: 2016-02-19 12:35:35 139654577776384 [Note] WSREP: Flushing tables for SST...
|
Feb 19 12:35:36 ip-172-31-22-174 mysqld: 2016-02-19 12:35:36 139654577776384 [Warning] WSREP: error executing 'FLUSH TABLES WITH READ LOCK': 1205 (Lock wait timeout exceeded; try restarting transaction)
|
Feb 19 12:35:36 ip-172-31-22-174 mysqld: 2016-02-19 12:35:36 139654577776384 [ERROR] WSREP: Failed to flush and lock tables
|
Feb 19 12:35:36 ip-172-31-22-174 mysqld: 2016-02-19 12:35:36 139654577776384 [ERROR] WSREP: Failed to flush tables: -1 (Unknown error -1)
|
Feb 19 12:35:36 ip-172-31-22-174 mysqld: 2016-02-19 12:35:36 139655555049216 [Warning] WSREP: 1.0 (): State transfer to 0.0 () failed: -78 (Remote address changed)
|
And the wsrep_sst_rsync process will not die. For each additional SST attempt, there will be another leftover process:
$ ps -elf | grep "wsrep_sst_rsync" | wc -l
|
5
|
$ ps -elf | grep "wsrep_sst_rsync"
|
0 S mysql 2210 1 0 80 0 - 28837 wait 12:35 ? 00:00:00 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role donor --address 172.31.19.192:4444/rsync_sst --socket /var/lib/mysql/mysql.sock --datadir /var/lib/mysql/ --gtid 474f3f92-d723-11e5-8da3-d3e87bd5db9a:1 --gtid-domain-id 0
|
0 S mysql 2309 1 0 80 0 - 28837 wait 12:35 ? 00:00:00 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role donor --address 172.31.19.192:4444/rsync_sst --socket /var/lib/mysql/mysql.sock --datadir /var/lib/mysql/ --gtid 474f3f92-d723-11e5-8da3-d3e87bd5db9a:1 --gtid-domain-id 0
|
0 S mysql 2487 1 0 80 0 - 28837 wait 12:35 ? 00:00:00 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role donor --address 172.31.19.192:4444/rsync_sst --socket /var/lib/mysql/mysql.sock --datadir /var/lib/mysql/ --gtid 474f3f92-d723-11e5-8da3-d3e87bd5db9a:1 --gtid-domain-id 0
|
0 S mysql 2747 1 0 80 0 - 28837 wait 12:35 ? 00:00:00 /bin/bash -ue /usr//bin/wsrep_sst_rsync --role donor --address 172.31.19.192:4444/rsync_sst --socket /var/lib/mysql/mysql.sock --datadir /var/lib/mysql/ --gtid 474f3f92-d723-11e5-8da3-d3e87bd5db9a:1 --gtid-domain-id 0
|
0 R ec2-user 14138 1915 0 80 0 - 28160 - 12:40 pts/0 00:00:00 grep --color=auto wsrep_sst_rsync
|