[MDEV-31756] WAIT/NOWAIT in DDL makes binary logs difficult or impossible to replay Created: 2023-07-20 Updated: 2023-11-28 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Locking, Replication, Storage Engine - InnoDB |
| Affects Version/s: | 10.5, 10.6, 10.9, 10.10, 10.11, 11.0, 11.1 |
| Fix Version/s: | 10.5, 10.6, 10.11, 11.0, 11.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Elena Stepanova | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
I don't know if anything can be done about it, but I want it at least recorded somewhere, as I spent a lot of time trying to understand why non-concurrent SQL in general and binlog replay in particular fail the way the shouldn't. Single-threaded DDL/DML can occasionally fail with lock wait timeout error. It is reproducible rather easily on 10.5+. I didn't try to make an MTR test case for it, because I can only make a crude non-deterministic one, while it's probably fairly simple for InnoDB people to create a synchronized one if necessary. Apparently what's happening is that InnoDB purge takes an MDL lock (the courtesy of With normal DDL without WAIT/NOWAIT clause it's an unlikely scenario, as lock wait timeout variable values are high enough (and if it is observed, they can be increased, so it's not that critical, just very confusing). However DDL with WAIT and especially NOWAIT poses a real problem. If a user application can at least handle the error (hopefully not many real-life applications assume that DDL is always run non-concurrently, or if they do, why do they put WAIT/NOWAIT there to begin with), and replication deals with it via transaction retries, I can't see any way to work around it upon binary log replaying, and with big enough data and a big enough log, it will almost inevitably happen. |
| Comments |
| Comment by Marko Mäkelä [ 2023-07-22 ] |
|
I think that this is a problem of metadata locking (MDL) or replication. InnoDB merely is a user of metadata locks. I think that this needs to be addressed outside InnoDB. The metadata locks that are held by InnoDB should be of relatively short duration. If the observed scenarios involve FULLTEXT INDEX or FOREIGN KEY, it could be an InnoDB issue that needs to be reproduced and resolved. |
| Comment by Elena Stepanova [ 2023-07-23 ] |
|
It's nobody's problem. Every piece works perfectly, the pieces just don't fit together, that's the problem. Binary logging correctly writes ALTER into the binary log, because it was executed, what else binary logging is supposed to do? A binlog should be replayable, it's its second main purpose. Unless a user actively intervenes, e.g. by running something in parallel or otherwise tampering with the server; but it's not the case here. I can't see what either replication (or binary logging, rather) or MDL could do differently here. If they can, that would be great of course. But the only part I could suspect to be possibly wrong is InnoDB misusing MDL locks. One can be merely a user of a hammer, but using it to put a screw in its place may be not the best idea. One may achieve the goal, but either the screw or the surface are likely to break in the process. That's what we seem to see here. |
| Comment by Kristian Nielsen [ 2023-07-23 ] |
|
Maybe the ALTER should not be binlogged with NOWAIT. That doesn't really seem appropriate for the binlog, and is not really safe for replication, as you mentioned. For DML we have the mechanism to switch to row-based binlogging when the original query is not safe. We don't really have a similar mechanism for DDL. There is already a bit of rewriting done for DLL (DROP TABLE?), so removing NOWAIT could perhaps be done similarly, but I think perhaps that rewrite is quite hackish. The replication slave has the mechanism to retry for temp errors (though NOWAIT still does not seem appropriate). The mysqlbinlog | mysql crude method doesn't have that, I think it's never been really robust, more like a `best effort` kind of semi-feature. |
| Comment by Elena Stepanova [ 2023-07-23 ] |
|
Maybe this could be a solution. I didn't want to suggest it, because writing into the binary log something different from what was executed always feels like a hack, but on the other hand, better if it works with a hack than doesn't work without. I don't think we can ignore binlog replay because it's crude. It's an established way of restore, with a capability to restore the state of the arbitrary point in time (even separately documented as such in MySQL, I'm not so sure about MariaDB KB). We tend to mix up binlog replay with replication, I guess that's why it is not usually tested separately, it is just assumed that if replication works then the binary log is surely in order, even though the mechanisms of using it are different. |