[MDEV-9626] FLUSH TABLES WITH READ LOCK Locks the entire cluster Created: 2016-02-24 Updated: 2019-10-03 Resolved: 2019-10-03 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.1.11 |
| Fix Version/s: | 10.1.41 |
| Type: | Bug | Priority: | Major |
| Reporter: | Claudio Nanni | Assignee: | Stepan Patryshev (Inactive) |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | galera | ||
| Environment: |
Linux 64 |
||
| Attachments: |
|
| Description |
|
FLUSH TABLES WITH READ LOCK on one node appears to lock other nodes as well. In a 3 nodes Galera cluster if I do FTWRL on one node and apply simple DDL or DML on other nodes I don't have any locking. To reproduce the issue I used a trivial mysqlslap test:
With this as soon as FTWRL is executed on another node, say 3, all threads of mysqlslap on node 1 will hang with something like: {{
}} At this point, writing to any other node is also impossible, statements will hang, and so the whole cluster is locked. FTWRL is used for backups, for example, so potentially a backup can lock the whole cluster. |
| Comments |
| Comment by Claudio Nanni [ 2016-02-24 ] | |||
|
Forgot to add: When node 3 is doing FTWRL and I start mysqlslap on node 1, it always stops after ~31 statements (general log), even changing the options concurrency and --number-char-cols, so the following three runs give the same result:
Only 31 commands are logged (the general log logs before, so the last one is the one that locks), and it's usually the 27th INSERT. Using larger data does not change the number of statements before the hanging. 1 160224 18:06:49 592 Connect root@localhost | |||
| Comment by Claudio Nanni [ 2016-02-25 ] | |||
|
The amount of statements after which writing on one node(say node 1) will hang is controlled by gcs.fc_limit on the node that is doing FTWRL (node 3). So I imagine that the node doing FTWRL even if goes into "Provider suspended" mode, it is still 'sensitive' to incoming flow control, and so it will stop other nodes from applying changes after his "limit" is reached. So I tried to do add this before FTWRL: set GLOBAL wsrep_desync=ON; And it works, the node 1 will not block anymore, neither other cluster nodes. As a curiosity, if you execute: FTWRL and after set GLOBAL wsrep_desync=ON; The node will lock forever and you need to kill mysqld. | |||
| Comment by Seppo Jaakola [ 2016-02-26 ] | |||
|
Thanks for the detailed test case, I can reproduce the issue in upstream galera cluster, we will schedule a fix for this. | |||
| Comment by Claudio Nanni [ 2018-01-03 ] | |||
|
I think this problem is currently solved, FTWRL now will automatically desync the node relaxing so the flow control. | |||
| Comment by Stepan Patryshev (Inactive) [ 2019-10-03 ] | |||
|
Reproduced it on debug built from the released sources 10.1.11. |