Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31756

WAIT/NOWAIT in DDL makes binary logs difficult or impossible to replay

Details

    Description

      I don't know if anything can be done about it, but I want it at least recorded somewhere, as I spent a lot of time trying to understand why non-concurrent SQL in general and binlog replay in particular fail the way the shouldn't.

      Single-threaded DDL/DML can occasionally fail with lock wait timeout error. It is reproducible rather easily on 10.5+. I didn't try to make an MTR test case for it, because I can only make a crude non-deterministic one, while it's probably fairly simple for InnoDB people to create a synchronized one if necessary.

      Apparently what's happening is that InnoDB purge takes an MDL lock (the courtesy of MDEV-16678), and if it holds it long enough, DDL waiting for the lock will fail.

      With normal DDL without WAIT/NOWAIT clause it's an unlikely scenario, as lock wait timeout variable values are high enough (and if it is observed, they can be increased, so it's not that critical, just very confusing). However DDL with WAIT and especially NOWAIT poses a real problem.

      If a user application can at least handle the error (hopefully not many real-life applications assume that DDL is always run non-concurrently, or if they do, why do they put WAIT/NOWAIT there to begin with), and replication deals with it via transaction retries, I can't see any way to work around it upon binary log replaying, and with big enough data and a big enough log, it will almost inevitably happen.

      Attachments

        Activity

          I think that this is a problem of metadata locking (MDL) or replication. InnoDB merely is a user of metadata locks. I think that this needs to be addressed outside InnoDB.

          The metadata locks that are held by InnoDB should be of relatively short duration. If the observed scenarios involve FULLTEXT INDEX or FOREIGN KEY, it could be an InnoDB issue that needs to be reproduced and resolved.

          marko Marko Mäkelä added a comment - I think that this is a problem of metadata locking (MDL) or replication. InnoDB merely is a user of metadata locks. I think that this needs to be addressed outside InnoDB. The metadata locks that are held by InnoDB should be of relatively short duration. If the observed scenarios involve FULLTEXT INDEX or FOREIGN KEY , it could be an InnoDB issue that needs to be reproduced and resolved.
          elenst Elena Stepanova added a comment - - edited

          It's nobody's problem. Every piece works perfectly, the pieces just don't fit together, that's the problem.

          Binary logging correctly writes ALTER into the binary log, because it was executed, what else binary logging is supposed to do?
          mysqlbinlog correctly reads it from the binlog and sends to the client.
          The client gets a query and sends it to the server.
          The server gets the query and executes it.
          InnoDB at this moment decides to take MDL locks.
          ALTER NOWAIT, facing the situation, works exactly as it should, returning the error.
          The rest of binlog replay becomes naturally impossible (even if the client is run with --force, the schema already diverged).

          A binlog should be replayable, it's its second main purpose. Unless a user actively intervenes, e.g. by running something in parallel or otherwise tampering with the server; but it's not the case here.

          I can't see what either replication (or binary logging, rather) or MDL could do differently here. If they can, that would be great of course. But the only part I could suspect to be possibly wrong is InnoDB misusing MDL locks. One can be merely a user of a hammer, but using it to put a screw in its place may be not the best idea. One may achieve the goal, but either the screw or the surface are likely to break in the process. That's what we seem to see here.

          elenst Elena Stepanova added a comment - - edited It's nobody's problem. Every piece works perfectly, the pieces just don't fit together, that's the problem. Binary logging correctly writes ALTER into the binary log, because it was executed, what else binary logging is supposed to do? mysqlbinlog correctly reads it from the binlog and sends to the client. The client gets a query and sends it to the server. The server gets the query and executes it. InnoDB at this moment decides to take MDL locks. ALTER NOWAIT, facing the situation, works exactly as it should, returning the error. The rest of binlog replay becomes naturally impossible (even if the client is run with --force, the schema already diverged). A binlog should be replayable, it's its second main purpose. Unless a user actively intervenes, e.g. by running something in parallel or otherwise tampering with the server; but it's not the case here. I can't see what either replication (or binary logging, rather) or MDL could do differently here. If they can, that would be great of course. But the only part I could suspect to be possibly wrong is InnoDB misusing MDL locks. One can be merely a user of a hammer, but using it to put a screw in its place may be not the best idea. One may achieve the goal, but either the screw or the surface are likely to break in the process. That's what we seem to see here.

          Maybe the ALTER should not be binlogged with NOWAIT. That doesn't really seem appropriate for the binlog, and is not really safe for replication, as you mentioned. For DML we have the mechanism to switch to row-based binlogging when the original query is not safe. We don't really have a similar mechanism for DDL. There is already a bit of rewriting done for DLL (DROP TABLE?), so removing NOWAIT could perhaps be done similarly, but I think perhaps that rewrite is quite hackish.

          The replication slave has the mechanism to retry for temp errors (though NOWAIT still does not seem appropriate). The mysqlbinlog | mysql crude method doesn't have that, I think it's never been really robust, more like a `best effort` kind of semi-feature.

          knielsen Kristian Nielsen added a comment - Maybe the ALTER should not be binlogged with NOWAIT. That doesn't really seem appropriate for the binlog, and is not really safe for replication, as you mentioned. For DML we have the mechanism to switch to row-based binlogging when the original query is not safe. We don't really have a similar mechanism for DDL. There is already a bit of rewriting done for DLL (DROP TABLE?), so removing NOWAIT could perhaps be done similarly, but I think perhaps that rewrite is quite hackish. The replication slave has the mechanism to retry for temp errors (though NOWAIT still does not seem appropriate). The mysqlbinlog | mysql crude method doesn't have that, I think it's never been really robust, more like a `best effort` kind of semi-feature.

          Maybe this could be a solution. I didn't want to suggest it, because writing into the binary log something different from what was executed always feels like a hack, but on the other hand, better if it works with a hack than doesn't work without.
          Of course it will have to apply also to WAIT N, not just NOWAIT.
          In any case it's better than what I'm doing now to work around it (and other users would have to if they needed it to work) – piping mysqlbinlog output through sed before feeding it to the client.

          I don't think we can ignore binlog replay because it's crude. It's an established way of restore, with a capability to restore the state of the arbitrary point in time (even separately documented as such in MySQL, I'm not so sure about MariaDB KB). We tend to mix up binlog replay with replication, I guess that's why it is not usually tested separately, it is just assumed that if replication works then the binary log is surely in order, even though the mechanisms of using it are different.

          elenst Elena Stepanova added a comment - Maybe this could be a solution. I didn't want to suggest it, because writing into the binary log something different from what was executed always feels like a hack, but on the other hand, better if it works with a hack than doesn't work without. Of course it will have to apply also to WAIT N , not just NOWAIT . In any case it's better than what I'm doing now to work around it (and other users would have to if they needed it to work) – piping mysqlbinlog output through sed before feeding it to the client. I don't think we can ignore binlog replay because it's crude. It's an established way of restore, with a capability to restore the state of the arbitrary point in time (even separately documented as such in MySQL, I'm not so sure about MariaDB KB). We tend to mix up binlog replay with replication, I guess that's why it is not usually tested separately, it is just assumed that if replication works then the binary log is surely in order, even though the mechanisms of using it are different.

          Patch that implements removing the WAIT <n> and NOWAIT from DDL queries:

          https://lists.mariadb.org/hyperkitty/list/commits@lists.mariadb.org/thread/5H7RTSFXK5VOZILSKMID54FCAE4FOW4W/

          This is similar to how it is already done for INSERT DELAYED, where the DELAYED is removed from the query before binlogging (in statement mode).

          Note that this seems to affect mysqlbinlog | mysql apply only, normal replication seems ok. This because WAIT <n> / NOWAIT is implemented internally in the parser as "SET STATEMENT lock_wait_timeout=0, innodb_lock_wait_timeout=0 FOR ...", and this is disabled in replication by this code:

            if (!lex->stmt_var_list.is_empty() && !thd->slave_thread)
            {
              ...
          

          But elenst, if you have seen this problem also in master->slave replication, please let me know.

          It was btw. a bit tricky to do the testcase. First because I found it doesn't affect normal replication (at least in my test). Second because a NOWAIT DDL on an InnoDB table will fail sporadically due to the InnoDB background thread locks even before considering binlogging! In the end I managed by using MyISAM and running a query containing a short SLEEP() in parallel (using --send) with the mysqlbinlog | mysql. The testcase also greps the mysqlbinlog output for WAIT/NOWAIT.

          I suggest to target 11.4 for this bug. The rationale being that the impact of this bug is relative minor, and rewriting DDL queries like this is not an entirely trivial change. But let me know if anyone has a strong opinion otherwise (target either earlier, or later version?).

          knielsen Kristian Nielsen added a comment - Patch that implements removing the WAIT <n> and NOWAIT from DDL queries: https://lists.mariadb.org/hyperkitty/list/commits@lists.mariadb.org/thread/5H7RTSFXK5VOZILSKMID54FCAE4FOW4W/ This is similar to how it is already done for INSERT DELAYED, where the DELAYED is removed from the query before binlogging (in statement mode). Note that this seems to affect mysqlbinlog | mysql apply only, normal replication seems ok. This because WAIT <n> / NOWAIT is implemented internally in the parser as "SET STATEMENT lock_wait_timeout=0, innodb_lock_wait_timeout=0 FOR ...", and this is disabled in replication by this code: if (!lex->stmt_var_list.is_empty() && !thd->slave_thread) { ... But elenst , if you have seen this problem also in master->slave replication, please let me know. It was btw. a bit tricky to do the testcase. First because I found it doesn't affect normal replication (at least in my test). Second because a NOWAIT DDL on an InnoDB table will fail sporadically due to the InnoDB background thread locks even before considering binlogging! In the end I managed by using MyISAM and running a query containing a short SLEEP() in parallel (using --send) with the mysqlbinlog | mysql. The testcase also greps the mysqlbinlog output for WAIT/NOWAIT. I suggest to target 11.4 for this bug. The rationale being that the impact of this bug is relative minor, and rewriting DDL queries like this is not an entirely trivial change. But let me know if anyone has a strong opinion otherwise (target either earlier, or later version?).

          I'm starting to like my patch less. This rewriting of the query string based on the parse tree feels quite hackish, and I do not think it will scale very well to doing it in more places. There's already a limitation in the patch, becase the parser allows something like RENAME t1 NOWAIT TO t2, t3 WAIT 10 TO t4. The patch will only replace the last WAIT 10, leaving the NOWAIT, which is not correct, so to do it correctly would require a list of rewrites to do, which seems overly complex. (The possibility to have multiple WAIT/NOWAIT in a RENAME is already somewhat of a bug, since all but the last will be ignored, it is not like they apply individually to each table pair).

          Another approach could be some system variable that could be set in the mysqlbinlog output that would disable WAIT/NOWAIT. I'm not trilled about that either.

          I think I will leave this bug for now for the moment...

          knielsen Kristian Nielsen added a comment - I'm starting to like my patch less. This rewriting of the query string based on the parse tree feels quite hackish, and I do not think it will scale very well to doing it in more places. There's already a limitation in the patch, becase the parser allows something like RENAME t1 NOWAIT TO t2, t3 WAIT 10 TO t4. The patch will only replace the last WAIT 10, leaving the NOWAIT, which is not correct, so to do it correctly would require a list of rewrites to do, which seems overly complex. (The possibility to have multiple WAIT/NOWAIT in a RENAME is already somewhat of a bug, since all but the last will be ignored, it is not like they apply individually to each table pair). Another approach could be some system variable that could be set in the mysqlbinlog output that would disable WAIT/NOWAIT. I'm not trilled about that either. I think I will leave this bug for now for the moment...

          People

            knielsen Kristian Nielsen
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.