Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27784

log_overwrite_warning displays an error instead of a warning

Details

    Description

      log_close() may display a log_overwrite_warning() which contains the text:

      InnoDB: The age of the last checkpoint is <age>, which exceeds the log capacity <capacity>.

      This is printed using sql_print_error, meaning it appears as an ERROR in the log.

      However, log_close() appears to simply treat this as a warning when the redo log is full and will return PAGE_FLUSH_SYNC to indicate that "furious flushing is needed".

      Therefore, this appears not to be an error that would result in important redo log entries being overwritten. Instead, this appears to just be an indication that the redo logs are full and that the server will pause to flush before continuing.

      Incorrectly reporting this as an ERROR will cause users to worry that there is a data integrity problem, when in reality it's just an informational note.

      I am getting many of these ERRORs in my logs on a slave server when I do a shutdown with innodb_fast_shutdown = 0, on a database that regularly writes around 200MB per minute to the 1G redo log file and therefore fills it up every few minutes.

      Here is an example of the error messages on shutdown:

      2022-02-05 14:08:29 0 [Note] /usr/sbin/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
      2022-02-05 14:08:29 0 [Note] InnoDB: FTS optimize thread exiting.
      2022-02-05 14:08:29 0 [Note] InnoDB: to purge 9 transactions
      2022-02-05 14:08:30 0 [Note] InnoDB: Starting shutdown...
      2022-02-05 14:08:30 0 [Note] InnoDB: Dumping buffer pool(s) to /mariadb/ib_buffer_pool
      2022-02-05 14:08:30 0 [Note] InnoDB: Restricted to 32388 pages due to innodb_buf_pool_dump_pct=25
      2022-02-05 14:08:30 0 [Note] InnoDB: Buffer pool(s) dump completed at 220205 14:08:30
      2022-02-05 14:11:05 0 [ERROR] InnoDB: The age of the last checkpoint is 966376793, which exceeds the log capacity 966365799.
      2022-02-05 14:11:21 0 [ERROR] InnoDB: The age of the last checkpoint is 994921739, which exceeds the log capacity 966365799.
      2022-02-05 14:11:37 0 [ERROR] InnoDB: The age of the last checkpoint is 1022257877, which exceeds the log capacity 966365799.
      2022-02-05 14:11:53 0 [ERROR] InnoDB: The age of the last checkpoint is 1047226930, which exceeds the log capacity 966365799.
      2022-02-05 14:12:09 0 [ERROR] InnoDB: The age of the last checkpoint is 1071228676, which exceeds the log capacity 966365799.
      2022-02-05 14:12:25 0 [ERROR] InnoDB: The age of the last checkpoint is 1093753956, which exceeds the log capacity 966365799.
      2022-02-05 14:12:41 0 [ERROR] InnoDB: The age of the last checkpoint is 1112481150, which exceeds the log capacity 966365799.
      2022-02-05 14:12:54 0 [Note] InnoDB: Waiting for change buffer merge to complete number of bytes of change buffer just merged: 192206046
      2022-02-05 14:13:00 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
      2022-02-05 14:13:00 0 [Note] InnoDB: Shutdown completed; log sequence number 236522592867138; transaction id 35631972102
      2022-02-05 14:13:00 0 [Note] /usr/sbin/mariadbd: Shutdown complete

      Attachments

        Issue Links

          Activity

            monty suggested an idea that goes roughly like this:

            • If innodb_log_file_size is large enough, retain the current logic. That is, assume that between log_free_check() and mtr_t::commit() it is impossible for the log to overrun.
            • For smaller log files, in each log_free_check(), reserve space (or increment an atomic counter), and release it after the log was written in mtr_t::commit().

            How large is ‘large enough’? There may be at most 8×innodb_page_size active transactions (524,288 when using innodb_page_size=64k). There may also be 32 purge threads, and up to 255 fil_crypt_thread. In the worst case, each thread may be splitting a B-tree to its maximum height of 31 or 32 nodes, that is, modify roughly 64 pages and write up to roughly 32×innodb_page_size bytes of log.

            Rounding up, the ‘large enough’ log file should be something like 9×innodb_page_size×32×innodb_page_size = 288×innodb_page_size×innodb_page_size. That would range from 4.5 GiB to 1,152 GiB. At the default innodb_page_size=16k, this ‘large enough’ estimate would be 72 GiB.

            Even though this estimate might feel like an overestimate, it may in fact be an underestimate. We assumed that each of the concurrent threads would run only once. But, it is possible that on such a heavily loaded system, some threads are starved of CPU resources for an extended period of time while other threads are scheduled to run multiple mtr_t::commit() each. One of those starved threads could already have acquired a latch on the oldest modified page, and the log_free_check() from other threads would not be able to advance the checkpoint. Eventually, all subsequent threads would end up waiting in log_free_check() and the starved threads would be allowed to proceed to release the page latches, which would allow the checkpoint to be completed. During such a glitch it is possible that the log was momentarily overrun.

            So, there does not exist any ‘large enough’ innodb_log_file_size that can guarantee to avoid the race condition between log_free_check() and mtr.commit(). We could only reduce the probability to very close to 0, which is already the case for well configured installations. Besides, 72 GiB is already a quite large redo log. Even with innodb_buffer_pool_size=128G it could be perfectly reasonable to use a much smaller innodb_log_file_size, especially when page splits are rare.

            Practically, if we wanted to completely avoid the log overrun condition, we practically would have to implement some additional locking. Based on some experience with MDEV-25062, I expect that an innocent-looking std::atomic::fetch_add() call for updating a global variable will seriously degrade throughput. Even when using numactl to ensure that the server runs on a single NUMA node, the AMD64 LOCK XADD instruction for trx_sys.rseg_history_len++ consumed several per cent of all CPU cycles during my simple benchmark. If we added anything like that to log_free_check(), the impact should be much bigger, because every write transaction consists of multiple mini-transactions.

            What if we implemented the extra locking only for the cases where a mini-transaction is expected to split or merge pages? That would still leave a small probability of log overrun due to threads being scheduled ‘unfairly’ on a heavily loaded system.

            marko Marko Mäkelä added a comment - monty suggested an idea that goes roughly like this: If innodb_log_file_size is large enough, retain the current logic. That is, assume that between log_free_check() and mtr_t::commit() it is impossible for the log to overrun. For smaller log files, in each log_free_check() , reserve space (or increment an atomic counter), and release it after the log was written in mtr_t::commit() . How large is ‘large enough’? There may be at most 8× innodb_page_size active transactions (524,288 when using innodb_page_size=64k ). There may also be 32 purge threads, and up to 255 fil_crypt_thread . In the worst case, each thread may be splitting a B-tree to its maximum height of 31 or 32 nodes, that is, modify roughly 64 pages and write up to roughly 32×innodb_page_size bytes of log. Rounding up, the ‘large enough’ log file should be something like 9× innodb_page_size ×32× innodb_page_size = 288× innodb_page_size × innodb_page_size . That would range from 4.5 GiB to 1,152 GiB. At the default innodb_page_size=16k , this ‘large enough’ estimate would be 72 GiB. Even though this estimate might feel like an overestimate, it may in fact be an underestimate. We assumed that each of the concurrent threads would run only once. But, it is possible that on such a heavily loaded system, some threads are starved of CPU resources for an extended period of time while other threads are scheduled to run multiple mtr_t::commit() each. One of those starved threads could already have acquired a latch on the oldest modified page, and the log_free_check() from other threads would not be able to advance the checkpoint. Eventually, all subsequent threads would end up waiting in log_free_check() and the starved threads would be allowed to proceed to release the page latches, which would allow the checkpoint to be completed. During such a glitch it is possible that the log was momentarily overrun. So, there does not exist any ‘large enough’ innodb_log_file_size that can guarantee to avoid the race condition between log_free_check() and mtr.commit() . We could only reduce the probability to very close to 0, which is already the case for well configured installations. Besides, 72 GiB is already a quite large redo log. Even with innodb_buffer_pool_size=128G it could be perfectly reasonable to use a much smaller innodb_log_file_size , especially when page splits are rare. Practically, if we wanted to completely avoid the log overrun condition, we practically would have to implement some additional locking. Based on some experience with MDEV-25062 , I expect that an innocent-looking std::atomic::fetch_add() call for updating a global variable will seriously degrade throughput. Even when using numactl to ensure that the server runs on a single NUMA node, the AMD64 LOCK XADD instruction for trx_sys.rseg_history_len++ consumed several per cent of all CPU cycles during my simple benchmark. If we added anything like that to log_free_check() , the impact should be much bigger, because every write transaction consists of multiple mini-transactions. What if we implemented the extra locking only for the cases where a mini-transaction is expected to split or merge pages? That would still leave a small probability of log overrun due to threads being scheduled ‘unfairly’ on a heavily loaded system.
            cedric Cedric added a comment - - edited

            ?? if the server was not killed, no data could have been lost??

            marko I think therefore you're saying that if it says "shutdown complete", I should ignore the error. I'm guessing the reason that this still needs to be listed as an ERROR in the logs is because if the shutdown did not complete successfully, then the ERROR would be there to let me know that my DB has lost data and thus I need to restore from backup?

            cedric Cedric added a comment - - edited ?? if the server was not killed, no data could have been lost?? marko I think therefore you're saying that if it says "shutdown complete", I should ignore the error. I'm guessing the reason that this still needs to be listed as an ERROR in the logs is because if the shutdown did not complete successfully, then the ERROR would be there to let me know that my DB has lost data and thus I need to restore from backup?

            A user got these messages during a shutdown with innodb_fast_shutdown=0 when testing a misconfigured system (tiny innodb_log_file_size and innodb_buffer_pool_size=22G). I am quoting the relevant server log messages:

            2022-10-24 23:20:16 0 [Note] InnoDB: FTS optimize thread exiting.
            2022-10-24 23:20:16 0 [Note] InnoDB: to purge 5 transactions
            2022-10-24 23:20:17 0 [Note] InnoDB: Starting shutdown...
            2022-10-24 23:20:17 0 [Note] InnoDB: Dumping buffer pool(s) to /data/maria_data/ib_buffer_pool
            2022-10-24 23:20:17 0 [Note] InnoDB: Restricted to 354860 pages due to innodb_buf_pool_dump_pct=25
            2022-10-24 23:20:17 0 [Note] InnoDB: Buffer pool(s) dump completed at 221024 23:20:17
            2022-10-24 23:57:37 0 [ERROR] InnoDB: The age of the last checkpoint is 966373838, which exceeds the log capacity 966365799.
            2022-10-24 23:57:53 0 [ERROR] InnoDB: The age of the last checkpoint is 974973721, which exceeds the log capacity 966365799.
            2022-10-24 23:58:09 0 [ERROR] InnoDB: The age of the last checkpoint is 983054253, which exceeds the log capacity 966365799.
            2022-10-24 23:58:25 0 [ERROR] InnoDB: The age of the last checkpoint is 991661268, which exceeds the log capacity 966365799.
            ...
            2022-10-25  2:02:57 0 [ERROR] InnoDB: The age of the last checkpoint is 4351924949, which exceeds the log capacity 966365799.
            221025 02:03:48 mysqld_safe Starting mariadbd daemon with databases from /data/maria_data
            ...
            2022-10-25  2:03:49 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT at 12010302916974 between the checkpoint 12009722108511 and the end 12010302916608.
            

            According to this output, there were 5 only transactions to be purged in slow shutdown, but the slow shutdown kept running for over 2 hours until it was forcibly killed. It failed to recover from the corrupted (overwritten) log.

            During a slow shutdown, writes will be caused either by merging buffered changes to secondary index leaf pages, or the purge of history of completed transactions. The change buffer was disabled by default in MDEV-27734, mainly because bug reports led me to believe that it can corrupt data and we have not been able to reproduce that. If the change buffer was disabled, then I wonder how large those 5 transactions were.

            I could say more if I were able to reproduce this scenario with an executable test case (SQL workload).

            If one is unlucky and InnoDB hangs (for example, due to MDEV-29835) during a time when the misconfigured redo log is in in the corrupted state, then everything can be lost. Therefore it is of paramount importance to ensure that the innodb_log_file_size has been configured to be large enough. It would be nice if correlated parameters (such as innodb_buffer_pool_size and innodb_log_file_size) could be configured automatically based on a smaller number of parameters (MDEV-19895).

            An unreasonably small redo log will cause hugely degraded write performance, because it will trigger more frequent log checkpoints, which involves writing changed pages from the buffer pool pages to the data files. Starting with 10.5 and MDEV-21351, InnoDB crash recovery should work reliably even if the redo log is much larger than the buffer pool. I recommend reading a recent blog post by mdcallag.

            marko Marko Mäkelä added a comment - A user got these messages during a shutdown with innodb_fast_shutdown=0 when testing a misconfigured system (tiny innodb_log_file_size and innodb_buffer_pool_size=22G ). I am quoting the relevant server log messages: 2022-10-24 23:20:16 0 [Note] InnoDB: FTS optimize thread exiting. 2022-10-24 23:20:16 0 [Note] InnoDB: to purge 5 transactions 2022-10-24 23:20:17 0 [Note] InnoDB: Starting shutdown... 2022-10-24 23:20:17 0 [Note] InnoDB: Dumping buffer pool(s) to /data/maria_data/ib_buffer_pool 2022-10-24 23:20:17 0 [Note] InnoDB: Restricted to 354860 pages due to innodb_buf_pool_dump_pct=25 2022-10-24 23:20:17 0 [Note] InnoDB: Buffer pool(s) dump completed at 221024 23:20:17 2022-10-24 23:57:37 0 [ERROR] InnoDB: The age of the last checkpoint is 966373838, which exceeds the log capacity 966365799. 2022-10-24 23:57:53 0 [ERROR] InnoDB: The age of the last checkpoint is 974973721, which exceeds the log capacity 966365799. 2022-10-24 23:58:09 0 [ERROR] InnoDB: The age of the last checkpoint is 983054253, which exceeds the log capacity 966365799. 2022-10-24 23:58:25 0 [ERROR] InnoDB: The age of the last checkpoint is 991661268, which exceeds the log capacity 966365799. ... 2022-10-25 2:02:57 0 [ERROR] InnoDB: The age of the last checkpoint is 4351924949, which exceeds the log capacity 966365799. 221025 02:03:48 mysqld_safe Starting mariadbd daemon with databases from /data/maria_data ... 2022-10-25 2:03:49 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT at 12010302916974 between the checkpoint 12009722108511 and the end 12010302916608. According to this output, there were 5 only transactions to be purged in slow shutdown, but the slow shutdown kept running for over 2 hours until it was forcibly killed. It failed to recover from the corrupted (overwritten) log. During a slow shutdown, writes will be caused either by merging buffered changes to secondary index leaf pages, or the purge of history of completed transactions. The change buffer was disabled by default in MDEV-27734 , mainly because bug reports led me to believe that it can corrupt data and we have not been able to reproduce that. If the change buffer was disabled, then I wonder how large those 5 transactions were. I could say more if I were able to reproduce this scenario with an executable test case (SQL workload). If one is unlucky and InnoDB hangs (for example, due to MDEV-29835 ) during a time when the misconfigured redo log is in in the corrupted state, then everything can be lost. Therefore it is of paramount importance to ensure that the innodb_log_file_size has been configured to be large enough. It would be nice if correlated parameters (such as innodb_buffer_pool_size and innodb_log_file_size ) could be configured automatically based on a smaller number of parameters ( MDEV-19895 ). An unreasonably small redo log will cause hugely degraded write performance, because it will trigger more frequent log checkpoints, which involves writing changed pages from the buffer pool pages to the data files. Starting with 10.5 and MDEV-21351 , InnoDB crash recovery should work reliably even if the redo log is much larger than the buffer pool. I recommend reading a recent blog post by mdcallag .

            I filed MDEV-29905 for missing log_free_check() calls in change buffer operations. I think that it fully explains the reported corruption. According to the messages, the circular log had been overwritten astonishing 4½ times since the previous checkpoint. Even a single overwrite can be disastrous.

            marko Marko Mäkelä added a comment - I filed MDEV-29905 for missing log_free_check() calls in change buffer operations. I think that it fully explains the reported corruption. According to the messages, the circular log had been overwritten astonishing 4½ times since the previous checkpoint. Even a single overwrite can be disastrous.

            Unfortunately, this issue evolved from the original bug report into a discussion of log overwriting in InnoDB.

            The originally reported behavior is clearly not a bug. Log overwriting in InnoDB is a serious error and it is and should be reported as an [ERROR] not as a [Warning].

            As for the actual log overwriting, it, of course, shouldn't be happening. MDEV-29905 fixes a bug that was causing it. At least, it was one of the reasons. If there are more and you'll see this error after MDEV-29905 is fixed, please, open a new issue for it.

            serg Sergei Golubchik added a comment - Unfortunately, this issue evolved from the original bug report into a discussion of log overwriting in InnoDB. The originally reported behavior is clearly not a bug . Log overwriting in InnoDB is a serious error and it is and should be reported as an [ERROR] not as a [Warning] . As for the actual log overwriting, it, of course, shouldn't be happening. MDEV-29905 fixes a bug that was causing it. At least, it was one of the reasons. If there are more and you'll see this error after MDEV-29905 is fixed, please, open a new issue for it.

            People

              marko Marko Mäkelä
              cedric Cedric
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.