[MDEV-17772] 3 way lock : ALTER, MDL, BACKUP STAGE BLOCK_DDL Created: 2018-11-19 Updated: 2023-05-16 Resolved: 2018-12-10 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking |
| Affects Version/s: | 10.4.1 |
| Fix Version/s: | 10.4.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Vladislav Vaintroub | Assignee: | Sergey Vojtovich |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
Here is an unexpected wait with "Backup locks" (similar to the one described in create table t1(i int) engine innodb;
MariaDB [test]> start transaction; MariaDB [test]> select 1 from t1; # <-- acquires MDL lock
MariaDB [test]> alter table t1 add column (j int); # <-- waits on MDL
MariaDB [(none)]> backup stage start; MariaDB [(none)]> backup stage flush; MariaDB [(none)]> backup stage block_ddl; # <-- waits on something |
| Comments |
| Comment by Marko Mäkelä [ 2018-11-20 ] | |||||||||||||||||||
|
When addressing this, I believe that also the following scenario must be considered:
Here, the ALTER TABLE should be blocked in the MDL upgrade before handler::commit_inplace_alter_table(commit=true). Further DML is blocked from acquiring MDL on the table while the lock upgrade is in progress. The MDL conflict would not be resolved until connection default commits or rolls back or disconnects, or until the MDL upgrade times out (by default, in 1 year). (On MDL timeout, handler::commit_inplace_alter_table(commit=false) would be called without holding MDL_EXCLUSIVE.) Now, replace the last SELECT with BACKUP STAGE BLOCK_DDL or similar. I tested this on the plain 10.4 branch without Last, imagine that the BEGIN; SELECT was replaced with BACKUP STAGE BLOCK_DDL. In that scenario, if that statement is blocking the MDL upgrade of the ALTER TABLE, it would be nice if concurrent DML was allowed until the backup lock is released, and the ALTER TABLE has a real chance of upgrading the MDL for the final commit. | |||||||||||||||||||
| Comment by Michael Widenius [ 2018-11-20 ] | |||||||||||||||||||
|
First a comment regarding the original bug report: It's not a deadlock as when connection 1 does commit, then all the other block are freed. The reason you have a lock is the following: | |||||||||||||||||||
| Comment by Michael Widenius [ 2018-11-20 ] | |||||||||||||||||||
|
Regarding Marko's comment: >Now, replace the last SELECT with BACKUP STAGE BLOCK_DDL or similar. I tested this on the >plain 10.4 branch without The backup has to wait until ALTER TABLE either finishes or start copying data, the same way another ALTER TABLE on the same table has to wait.
In this case the BLOCK_DDL will block ALTER TABLE just before the final commit. However any DML for other tables will continue as normal. | |||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-11-20 ] | |||||||||||||||||||
|
Imagine a long select, and any DDL waiting for it, then backup should hang as well, for the duration of the select . I do not think this was planned. Maybe a considerable redesign is needed, just to keep the promise of | |||||||||||||||||||
| Comment by Michael Widenius [ 2018-11-20 ] | |||||||||||||||||||
|
Yes, any DDL's has to wait for SELECTS. You can consider that an inefficient too. That said, we are experimenting with BLOCK_DDL to either go past any waiting BACKUP_DDL lock (done by re-ording lock order) or allowing other DDL's while BLOCK_DDL is waiting for the DDL lock. Either of these solution may solve this in a decent way. | |||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-11-20 ] | |||||||||||||||||||
|
DDL waits for SELECT, this is fine, and efficient for me. BACKUP_DDL effectively waits for SELECT, because DDL is not running, and this is my concern. That DDL took a lock, is a technicality, what user sees is a hanging backup, waiting until SELECT is finished In fact, what I'm is seeing I hoped that deadlock would be fixed, as it was promised by | |||||||||||||||||||
| Comment by Michael Widenius [ 2018-11-20 ] | |||||||||||||||||||
|
There is no deadlock (as far as I can see or test). | |||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-11-20 ] | |||||||||||||||||||
|
The deadlock I'm referring to is the mentioned Since this bug is prominently featured in the description This happens with mariabackup that runs with --lock-ddl-per-table option a) A connection inside mariabackup is holding MDLs for all Innodb tables. MDL is acquired as SELECT 1 from table, a transaction. Transaction is not commited until the end of backup. This ^ is a deadlock. To resolve the deadlock, mariabackup issues KILL QUERY for the ALTER in b). This allows FTWRL to proceed. Another possible "solution" is --no-lock which avoids FTWRL. So, My conclusion therefore was that it did not really solve the problem with When you have a fix for that, I will remove KILL QUERY from the backup code. (I did not decide yet, whether to remove --lock-ddl-per-table option, maybe there is some weird case, where people would still like to use it. At least it turned out to be a useful tool to test BACKUP STAGE so far) | |||||||||||||||||||
| Comment by Michael Widenius [ 2018-11-21 ] | |||||||||||||||||||
|
Hi! Thanks for the explanation, it explains clearly the problem! The issue is how inplace/online ALTER TABLE is implemented. It's possible to fix this for ALTER TABLE, but you have the same issue with DROP TABLE and other DDL's, so there is no easy to fix it. The solution is simple to not allow --lock-ddl-per-table when using backup locks. I checked also this scenario with the old FTWRL: FTWRL locks on both 10.3 and new 10.4 as it as has to wait for ALTER TABLE that takes a MDL_INTENTION_EXLUSIVE/MDL_BACKUP_DDL lock which blocks the MDL_SHARED/MDL_BACKUP_FTWRL1 lock needed in lock_global_read_lock() | |||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-11-21 ] | |||||||||||||||||||
|
As I said, I'm not certain of the usefulness of lock-ddl-per-table. I can document them as "not needed anymore", and "deprecated", but they have not previously gone through any deprecation phase, so they need to be there for the time being. I would also like to see fix in the server. This needs to still hold if there is a non-running DDL present, otherwise BACKUP LOCK waits for all of the above | |||||||||||||||||||
| Comment by Michael Widenius [ 2018-11-21 ] | |||||||||||||||||||
|
Yes, it's instant under some premisses. One is that the connection that is using BACKUP LOCKS is not holding any locks of it self. The purpose of BACKUP LOCK BLOCK_DDL is to wait for all running DDL's (those that has DDL lock) to finish and stop new's from being issued. This exactly what it does. I am open for suggestion of how to fix it that doesn't involve dark magic | |||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-11-29 ] | |||||||||||||||||||
|
reworded the description so there is no reference to deadlock, as it seems confusing for some readers. Now it is called an "unexpected wait" |