[MDEV-17350] FLUSH TABLES keeps tables locked after timeout Created: 2018-10-02  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Locking
Affects Version/s: 5.5, 10.0, 10.1, 10.1.21, 10.1.34, 10.2, 10.3, 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Thomas Mischke Assignee: Michael Widenius
Resolution: Unresolved Votes: 1
Labels: upstream
Environment:

for example Linux version 2.6.32-696.10.1.el6.x86_64 (mockbuild@c1bl.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Tue Aug 22 18:51:35 UTC 2017



 Description   

Transactions started while the database is waiting to finish a "FLUSH TABLES WITH READ LOCK" are still blocked, when the FLUSH command hits a timeout. In our environment this happens as follows:

Session 1:

select * from a; # this is a very long running statement, 1800 seconds for example

Session 2:

SET lock_wait_timeout = 10;
flush tables with read lock; # this blocks due to session 1

Session 3:

select * from a where a.x=1; # this is executed, while session 2 is waiting in the flush tables

After 10 seconds the session 2 gets a timeout. We then close the connection. But now session 3 is still blocked. The list of running statements shows, that it is "waiting for table flush". Once session 1 returns, session 3 continues. This is the behavior we see with version 10.1.34.

In our live environment we see a slightly different behavior of session 3: Instead of waiting for session 1 to finish, it continues after exactly 600 seconds (our slow query log is full with such statements started shortly after we started "flush tables with read lock"). In our live system we are using version 10.1.21.

In the slow query log we see one more interesting aspect. The relevant line reads:
# Query_time: 601.052932 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
It claims that the lock time was 0, so what did the query do all those 600 seconds?

While our production tables have lots of indices we were able to reproduce it with a newly created table:
create table a select id from b
So we just had a table with lots of integers without indices. Still the same behavior.

Is there any way how we can tell session 3 to continue?



 Comments   
Comment by Elena Stepanova [ 2018-10-03 ]

Thanks for the report.

Reproducible as described on all major versions of MariaDB / MySQL, which would often indicate that it's an intended legacy behavior (however weird), but I can't find anything in MySQL documentation that would confirm it. Unless we are missing something, it's a really nasty flaw.

The test case below represents the scenario in the description, with only two changes:
1) Session 1 does LOCK TABLE .. READ instead of SELECT. It expectedly blocks FLUSH TABLES WITH READ LOCK, but naturally shouldn't anyhow affect SELECT in the 3rd session (and it doesn't unless the 2nd session interferes with FLUSH).
2) The problem remains even if the 2nd connection runs explicit UNLOCK TABLES after the failed FLUSH. I added UNLOCK TABLES there to indicate it.

The test case fails when select * from t1 in con1 hits a timeout. It shouldn't happen.

create table t1 (a int);
lock table t1 read;
 
--connect (con1,localhost,root,,)
set lock_wait_timeout= 3;
 
--connect (con2,localhost,root,,)
set lock_wait_timeout= 1;
--error ER_LOCK_WAIT_TIMEOUT
flush tables with read lock;
unlock tables;
--disconnect con2
 
--connection con1
select * from t1;
 
# Cleanup
--disconnect con1
--connection default
unlock tables;
drop table t1;

Comment by Sergey Vojtovich [ 2018-10-04 ]

Yes, this is intended behaviour and strictly speaking it has nothing to do with locking. That is client waits for invalidated table to get evicted from table definition cache, which happens only when previously started statements release this table.

I doubt it is feasible to rollback FLUSH TABLES effect on error, that is "uninvalidate" flushed tables. At the very least it means we need to track who invalidated what, which is fairly major effort.

On a positive note there's some progress going on to make backups less painful. Probably it could help in your situation as well, see MDEV-5336.

Comment by Sergey Vojtovich [ 2018-10-05 ]

thomas.mischke, could you briefly explain how you use FLUSH TABLES WITH READ LOCK, probably we could come up with some alternative.

Comment by Thomas Mischke [ 2018-10-05 ]

We are using it for our backup. Since the database is running in production while performing the backup, we set the timeout for FLUSH TABLES WITH READ LOCK to three seconds and keep trying (every 10 seconds), until we get the lock. Our data folder resides on an LVM volumn, so we create a snapshot then, issue UNLOCK TABLES and start to copy the content of the snapshot to our backup storage. This complete backup runs once a day. In addition we have an incremental backup (that is every hour we copy the binlog of our server instance).

Additional information:

  • Currently all tables in our databases are InnoDB.
  • Since all different kinds of application with different usage profiles are working on the database server, it normally takes up to one hour, before we get an FLUSH TABLES WITH READ LOCK without timeout. That would not be a problem, if session 3 from my example above continued after the timeout (or if there would be any way to detect these queries and ask them to continue or at least kill them).
  • We cannot increase the timeout for FLUSH TABLES WITH READ LOCK, because some of the robots in our production have a timeout of 5 seconds, that is either we get the data to the robot within 5 seconds, or it stops working until manually triggered.
  • For mysqldump or something alike the database server is too big (several TB).

After your comment from yesterday, I read about BACKUP LOCK: Looking forward to it. But some other comments I read made me wonder: Do I need FLUSH TABLES WITH READ LOCK with InnoDB tables, or can we go without (if we have the binlog at hand)?

Comment by Sergey Vojtovich [ 2018-10-05 ]

thomas.mischke, if only InnoDB tables matter and LVM snapshot produces something like copy-on-write pages, you definitely don't need the FLUSH TABLES part, but I still feel like WITH READ LOCK matters. Mainly because concurrent DDL may get InnoDB metadata and .frm metadata out of sync, that is you may end up with unrecoverable (or painfully recoverable) data.

Another problem might be if InnoDB data files get updated concurrently and snapshot sees partially updated page (not sure if this is possible, most probably it is), then you may get unrecoverable data. WITH READ LOCK most probably protects against this as well.

Since you copy data anyway, you may consider trying mariabackup. It workarounds partially updated pages by verifying page checksum and retrying if it doesn't match. There's mode, when it doesn't even have to issue FLUSH TABLES WITH READ LOCK. However I'm not sure what is mariabackup status quality wise.

Another option might be implementing syntax to be able to acquire global read lock without flushing tables. From InnoDB point of view this protection shouldn't differ from current FLUSH TABLES WITH READ LOCK.

wlad, could you correct me if I'm wrong?

Comment by Vladislav Vaintroub [ 2018-10-05 ]

FLUSH TABLES WITH READ LOCK part is needed for non-Innodb (system tables, mysql.user and such, in case they are modified). They are needed in case DDL is running. If caught in the middle of DDL operation, the backup might not be recoverable, even for Innodb.

FLUSH TABLES WITH READ LOCK is also used to get current binlog position, to bootstrap a new replication slave from a backup.

Current alternatives to FTWRL

  • LOCK TABLES READ can be used in some cases if no new tables are being created during backup (you'll need to calculate the list of tables yourself, maybe with information schema query). Good thing is that does not wait for SELECTs to finish, unlike FTWRL.
  • A nifty way to prevent DDL on transactional tables, while backup is running, is to acquire metadata lock in a dummy read-only transaction, that only references table names like this
    BEGIN
    SELECT 1 from table_first
    ...
    SELECT 1 from table_last
    <here you can do filesystem snapshot>
    COMMIT

The downside, as in case of LOCK TABLES READ, is you also need to calculate the list of the tables yourself from information schema, and ensure no new tables being created while taking backup.

Current plan for backup locks , that are lightweighter than FTWRL is MDEV-5336, but it might not appear until 10.4 is done.

But if you just care about Innodb, and can ensure that no DDL is running, yes, you can take snapshots without any locks.

you can try with mariabackup --backup --no-lock , but in this case you cannot ensure consistency of MyISAM, or give a consistent binlog position.
It can cope with DDL somewhat, since 10.2.18, even in --no-lock mode, however if DDL is detected at the very end of backup, it will loudly fail (10.2.19+), or silently create a non-recoverable backup in 10.2.18. The takeaway is that --no-lock option does not work great with DDLs , and you might need to rerun the backup again

Comment by Thomas Mischke [ 2018-10-09 ]

Thanks for pointing out the alternatives. I will take a closer look at mariabackup in each case (even if I find we need FTWRL).

I am quite sure that DDL is constantly being used in some of our databases, but I will double check.

To me it seems that we have to live with the current situation until 10.4, that is try to avoid long running statements when the backup wants to start and use our replication server for backups (it is a bit smaller than the main server, thus the backup is painful for other reasons...).

Generated at Thu Feb 08 08:35:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.