[MDEV-31756] WAIT/NOWAIT in DDL makes binary logs difficult or impossible to replay Created: 2023-07-20  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Locking, Replication, Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.9, 10.10, 10.11, 11.0, 11.1
Fix Version/s: 10.5, 10.6, 10.11, 11.0, 11.1

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: None


 Description   

I don't know if anything can be done about it, but I want it at least recorded somewhere, as I spent a lot of time trying to understand why non-concurrent SQL in general and binlog replay in particular fail the way the shouldn't.

Single-threaded DDL/DML can occasionally fail with lock wait timeout error. It is reproducible rather easily on 10.5+. I didn't try to make an MTR test case for it, because I can only make a crude non-deterministic one, while it's probably fairly simple for InnoDB people to create a synchronized one if necessary.

Apparently what's happening is that InnoDB purge takes an MDL lock (the courtesy of MDEV-16678), and if it holds it long enough, DDL waiting for the lock will fail.

With normal DDL without WAIT/NOWAIT clause it's an unlikely scenario, as lock wait timeout variable values are high enough (and if it is observed, they can be increased, so it's not that critical, just very confusing). However DDL with WAIT and especially NOWAIT poses a real problem.

If a user application can at least handle the error (hopefully not many real-life applications assume that DDL is always run non-concurrently, or if they do, why do they put WAIT/NOWAIT there to begin with), and replication deals with it via transaction retries, I can't see any way to work around it upon binary log replaying, and with big enough data and a big enough log, it will almost inevitably happen.



 Comments   
Comment by Marko Mäkelä [ 2023-07-22 ]

I think that this is a problem of metadata locking (MDL) or replication. InnoDB merely is a user of metadata locks. I think that this needs to be addressed outside InnoDB.

The metadata locks that are held by InnoDB should be of relatively short duration. If the observed scenarios involve FULLTEXT INDEX or FOREIGN KEY, it could be an InnoDB issue that needs to be reproduced and resolved.

Comment by Elena Stepanova [ 2023-07-23 ]

It's nobody's problem. Every piece works perfectly, the pieces just don't fit together, that's the problem.

Binary logging correctly writes ALTER into the binary log, because it was executed, what else binary logging is supposed to do?
mysqlbinlog correctly reads it from the binlog and sends to the client.
The client gets a query and sends it to the server.
The server gets the query and executes it.
InnoDB at this moment decides to take MDL locks.
ALTER NOWAIT, facing the situation, works exactly as it should, returning the error.
The rest of binlog replay becomes naturally impossible (even if the client is run with --force, the schema already diverged).

A binlog should be replayable, it's its second main purpose. Unless a user actively intervenes, e.g. by running something in parallel or otherwise tampering with the server; but it's not the case here.

I can't see what either replication (or binary logging, rather) or MDL could do differently here. If they can, that would be great of course. But the only part I could suspect to be possibly wrong is InnoDB misusing MDL locks. One can be merely a user of a hammer, but using it to put a screw in its place may be not the best idea. One may achieve the goal, but either the screw or the surface are likely to break in the process. That's what we seem to see here.

Comment by Kristian Nielsen [ 2023-07-23 ]

Maybe the ALTER should not be binlogged with NOWAIT. That doesn't really seem appropriate for the binlog, and is not really safe for replication, as you mentioned. For DML we have the mechanism to switch to row-based binlogging when the original query is not safe. We don't really have a similar mechanism for DDL. There is already a bit of rewriting done for DLL (DROP TABLE?), so removing NOWAIT could perhaps be done similarly, but I think perhaps that rewrite is quite hackish.

The replication slave has the mechanism to retry for temp errors (though NOWAIT still does not seem appropriate). The mysqlbinlog | mysql crude method doesn't have that, I think it's never been really robust, more like a `best effort` kind of semi-feature.

Comment by Elena Stepanova [ 2023-07-23 ]

Maybe this could be a solution. I didn't want to suggest it, because writing into the binary log something different from what was executed always feels like a hack, but on the other hand, better if it works with a hack than doesn't work without.
Of course it will have to apply also to WAIT N, not just NOWAIT.
In any case it's better than what I'm doing now to work around it (and other users would have to if they needed it to work) – piping mysqlbinlog output through sed before feeding it to the client.

I don't think we can ignore binlog replay because it's crude. It's an established way of restore, with a capability to restore the state of the arbitrary point in time (even separately documented as such in MySQL, I'm not so sure about MariaDB KB). We tend to mix up binlog replay with replication, I guess that's why it is not usually tested separately, it is just assumed that if replication works then the binary log is surely in order, even though the mechanisms of using it are different.

Generated at Thu Feb 08 10:26:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.