Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.4.1
-
None
Description
Here is an unexpected wait with "Backup locks" (similar to the one described in MDEV-15636, FTWRL-related, but Backup locks were supposed to be better than FTWRL, and be instant in most cases) Backup locks was supposed not to wait for SELECTs, or ALTER in progress, but in this case, it does. Moreover, in the case below, it waits until the end of a transaction, while no DDL, DML or SELECT is currently running (even if example has a DDL command, it is waiting and it did not start to run yet).
create table t1(i int) engine innodb;
- Connection 1 ( Acquire MDL lock)
MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.000 sec)
MariaDB [test]> select 1 from t1; # <-- acquires MDL lock
Empty set (0.001 sec)
- Connection 2 (ALTER TABLE)
MariaDB [test]> alter table t1 add column (j int); # <-- waits on MDL
- Connection 3 (BACKUP STAGE ...)
MariaDB [(none)]> backup stage start;
Query OK, 0 rows affected (0.000 sec)
MariaDB [(none)]> backup stage flush;
Query OK, 0 rows affected (0.002 sec)
MariaDB [(none)]> backup stage block_ddl; # <-- waits on something
Attachments
Issue Links
- relates to
-
MDEV-5336 Implement BACKUP STAGE for safe external backups
-
- Closed
-
-
MDEV-15636 mariabackup --lock-ddl-per-table hangs in FLUSH TABLES due to MDL conflict if ALTER TABLE issued
-
- Closed
-
The deadlock I'm referring to is the mentioned
MDEV-15636(mariabackup --lock-ddl-per-table hangs in FLUSH TABLES due to MDL conflict if ALTER TABLE issued)Since this bug is prominently featured in the description
MDEV-5336(Implement LOCK FOR BACKUP) assumed the problem is known/understood. if not, here is what happens.This happens with mariabackup that runs with --lock-ddl-per-table option
a) A connection inside mariabackup is holding MDLs for all Innodb tables. MDL is acquired as SELECT 1 from table, a transaction. Transaction is not commited until the end of backup.
b) A user connection is doing ALTER TABLE for which MDL is held. ALTER waits.
c) Another connection inside mariabackup is trying to acquire FTWRL.. It waits for ALTER in b), which is waits for transaction holding MDL in a), which is not commited until the end of backup.
This ^ is a deadlock.
To resolve the deadlock, mariabackup issues KILL QUERY for the ALTER in b). This allows FTWRL to proceed. Another possible "solution" is --no-lock which avoids FTWRL.
So,
-I tested BACKUP STAGE with mariabackup.
-I read
MDEV-5336. It said in description "This lock will also solve the problem withMDEV-15636(killing running queries that conflicts with FLUSH) as the backup locks willnot conflict with other DDL locks"
-I removed KILL QUERY from mariabackup. I wanted to see whether the statement "this lock will also solve the problem with
MDEV-15636" holds.-As a result , mariabackup hanged in a test that tests "lock-ddl-per-table" option.
My conclusion therefore was that it did not really solve the problem with
MDEV-15636.When you have a fix for that, I will remove KILL QUERY from the backup code.
(I did not decide yet, whether to remove --lock-ddl-per-table option, maybe there is some weird case, where people would still like to use it. At least it turned out to be a useful tool to test BACKUP STAGE so far)