[MDEV-9626] FLUSH TABLES WITH READ LOCK Locks the entire cluster Created: 2016-02-24  Updated: 2019-10-03  Resolved: 2019-10-03

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1.11
Fix Version/s: 10.1.41

Type: Bug Priority: Major
Reporter: Claudio Nanni Assignee: Stepan Patryshev (Inactive)
Resolution: Fixed Votes: 2
Labels: galera
Environment:

Linux 64


Attachments: Zip Archive MDEV-9626_10.1.11_reproduced.zip     Zip Archive MDEV-9626_10.1.41_fixed.zip    

 Description   

FLUSH TABLES WITH READ LOCK on one node appears to lock other nodes as well.

In a 3 nodes Galera cluster if I do FTWRL on one node and apply simple DDL or DML on other nodes I don't have any locking.

To reproduce the issue I used a trivial mysqlslap test:

  1. on my node 1
    $ mysqlslap --concurrency=10 --iterations=200 --number-int-cols=2 --number-char-cols=3 --auto-generate-sql -uroot -h127.0.0.1 -P10021

With this as soon as FTWRL is executed on another node, say 3, all threads of mysqlslap on node 1 will hang with something like:

{{

| 519 | root        | localhost:46438 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (2101913295,1100418235,'XQ0K9x30cYaFq2RvMZzKHKgmxy4uKBpreh4fX7f7XEEM8a9Nz8jGKF |    0.000 |
| 520 | root        | localhost:46440 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (2101913295,1100418235,'XQ0K9x30cYaFq2RvMZzKHKgmxy4uKBpreh4fX7f7XEEM8a9Nz8jGKF |    0.000 |
| 521 | root        | localhost:46442 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (2101913295,1100418235,'XQ0K9x30cYaFq2RvMZzKHKgmxy4uKBpreh4fX7f7XEEM8a9Nz8jGKF |    0.000 |
| 522 | root        | localhost:46444 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (2101913295,1100418235,'XQ0K9x30cYaFq2RvMZzKHKgmxy4uKBpreh4fX7f7XEEM8a9Nz8jGKF |    0.000 |
| 523 | root        | localhost:46446 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (1233983202,789373855,'jdtPbBILRQyiu7nZN3RdI96aQOP4Y0z7dlO4wbQF1OpvYdLnggGrqMP |    0.000 |
| 524 | root        | localhost:46448 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (1233983202,789373855,'jdtPbBILRQyiu7nZN3RdI96aQOP4Y0z7dlO4wbQF1OpvYdLnggGrqMP |    0.000 |
| 525 | root        | localhost:46450 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (1233983202,789373855,'jdtPbBILRQyiu7nZN3RdI96aQOP4Y0z7dlO4wbQF1OpvYdLnggGrqMP |    0.000 |
| 526 | root        | localhost:46452 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (1233983202,789373855,'jdtPbBILRQyiu7nZN3RdI96aQOP4Y0z7dlO4wbQF1OpvYdLnggGrqMP |    0.000 |
| 527 | root        | localhost:46454 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (1233983202,789373855,'jdtPbBILRQyiu7nZN3RdI96aQOP4Y0z7dlO4wbQF1OpvYdLnggGrqMP |    0.000 |
| 528 | root        | localhost:46456 | mysqlslap | Query   |    4 | query end          | INSERT INTO t1 VALUES (2101913295,1100418235,'XQ0K9x30cYaFq2RvMZzKHKgmxy4uKBpreh4fX7f7XEEM8a9Nz8jGKF |    0.000 |
+-----+-------------+-----------------+-----------+---------+------+--------------------+------------------------------------------------------------------------------------------------------+----------+

}}

At this point, writing to any other node is also impossible, statements will hang, and so the whole cluster is locked.

FTWRL is used for backups, for example, so potentially a backup can lock the whole cluster.



 Comments   
Comment by Claudio Nanni [ 2016-02-24 ]

Forgot to add: When node 3 is doing FTWRL and I start mysqlslap on node 1, it always stops after ~31 statements (general log), even changing the options concurrency and --number-char-cols, so the following three runs give the same result:

mysqlslap --concurrency=10 --iterations=200 --number-int-cols=2 --number-char-cols=3 --auto-generate-sql -uroot -h127.0.0.1 -P10021
mysqlslap --concurrency=1 --iterations=200 --number-int-cols=2 --number-char-cols=3 --auto-generate-sql -uroot -h127.0.0.1 -P10021
mysqlslap --concurrency=1 --iterations=200 --number-int-cols=2 --number-char-cols=50 --auto-generate-sql -uroot -h127.0.0.1 -P10021

Only 31 commands are logged (the general log logs before, so the last one is the one that locks), and it's usually the 27th INSERT. Using larger data does not change the number of statements before the hanging.

1 160224 18:06:49 592 Connect root@localhost
1 160224 18:06:50 592 Query INSERT
1 592 Init DB mysqlslap
1 592 Query CREATE SCHEMA `mysqlslap`
1 592 Query CREATE TABLE `t1`
1 592 Query DROP SCHEMA IF
27 592 Query INSERT INTO t1

Comment by Claudio Nanni [ 2016-02-25 ]

The amount of statements after which writing on one node(say node 1) will hang is controlled by gcs.fc_limit on the node that is doing FTWRL (node 3).

So I imagine that the node doing FTWRL even if goes into "Provider suspended" mode, it is still 'sensitive' to incoming flow control, and so it will stop other nodes from applying changes after his "limit" is reached.

So I tried to do add this before FTWRL:

set GLOBAL wsrep_desync=ON;

And it works, the node 1 will not block anymore, neither other cluster nodes.

As a curiosity, if you execute:

FTWRL and after

set GLOBAL wsrep_desync=ON;

The node will lock forever and you need to kill mysqld.

Comment by Seppo Jaakola [ 2016-02-26 ]

Thanks for the detailed test case, I can reproduce the issue in upstream galera cluster, we will schedule a fix for this.

Comment by Claudio Nanni [ 2018-01-03 ]

I think this problem is currently solved, FTWRL now will automatically desync the node relaxing so the flow control.
You can see that now FTWRL will put the node into desync state (wsrep_local_state=2)

Comment by Stepan Patryshev (Inactive) [ 2019-10-03 ]

Reproduced it on debug built from the released sources 10.1.11.
And verified that it is actually fixed on debug built from the released sources 10.1.41.
Appropriate logs are attached: MDEV-9626_10.1.11_reproduced.zip and MDEV-9626_10.1.41_fixed.zip.
Closing as fixed.

Generated at Thu Feb 08 07:36:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.