Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33594

Invoking log_free_check() while holding exclusive dictionary latch may block most InnoDB threads for a long time

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 5.5(EOL), (16)
      10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4
    • 10.6, 10.11, 11.4

    Description

      Recently, a customer experienced a crash of MariaDB Server 10.6.14 due to innodb_fatal_semaphore_wait_threshold being exceeded for dict_sys.latch. The latch was being exclusively held by a thread that was executing dict_stats_fetch_from_ps() in order to fetch persistent statistics while opening a table.

      Among the stack traces of some thousands of threads, there was one thread executing fdatasync() inside buf_page_write_complete(). This appears to be the only write that was in progress at the time of the crash (write_slots->m_cache.m_pos == 1). This would seem to indicate that there was some performance problem with the storage subsystem that ultimately caused the innodb_fatal_semaphore_wait_threshold watchdog to kick in.

      The invocation of log_free_check() inside the InnoDB internal SQL interpreter in que_run_threads_low() does not make sense for read-only operations, such as loading statistics. While the call cannot cause deadlocks, unnecessarily waiting for something while holding an exclusive latch is not good practice. The purpose of log_free_check() is to prevent an overrun of the circular ib_logfile0, by ensuring that the log checkpoint will be advanced before writes can continue.

      The reason why the wait cannot cause deadlocks is that by design, a thread is not allowed to invoke log_free_check() or wait for dict_sys.latch while it is holding a buffer page latch or any other latch (see MDEV-32049) that other page-latch holding threads may be waiting for. The buf_flush_page_cleaner() thread itself never waits for dict_sys.latch. The progress of log checkpoints could be blocked if any thread is holding a write latch on a block that would need to be written to a data page in order to advance the checkpoint.

      I checked all code paths where dict_sys.lock(SRW_LOCK_CALL) is being invoked in MariaDB Server 10.6. The following code paths may invoke log_free_check():

      • dict_stats_save() (saving persistent statistics; before 10.6, this was the main trigger of the infamous MDEV-15020 deadlocks)
      • dict_stats_fetch_from_ps() (loading persistent statistics)
      • fts_load_user_stopword() (loading stopwords for FULLTEXT INDEX)
      • (fts_parse_sql() is invoking pars_sql() but not executing it)
      • innodb_drop_database() (executing DROP DATABASE) is accessing FOREIGN KEY constraints; this is subject to change in MDEV-16417
      • many DDL operations are invoking the InnoDB internal SQL parser to update the InnoDB data dictionary tables while holding exclusive dict_sys.latch

      A notable subsystem that is clean in this respect is the loading of table definitions from the InnoDB data dictionary tables into the dict_sys cache.

      We might try to alleviate this problem by adding a log_free_check() invocation to each of the functions listed above, right before the dict_sys.lock acquisition. But, I do not think that it would bring much on a heavily loaded server; concurrently other threads would be able to write more log while we were waiting, and the log_free_check() invocation inside que_run_threads_low() could end up waiting again, and that time blocking the progress of many other threads.

      A sustainable fix would seem to be to refactor all use of the InnoDB internal SQL parser and remove that code, so that we have a better overview where log_free_check() needs to be invoked and dict_sys.latch really needs to be held. For DDL operations this is going to be tricky. For persistent statistics and operations related to FULLTEXT INDEX, this is more doable.

      We could also check if the operations on persistent statistics can be split so that only pars_sql() is invoked while holding dict_sys.latch, similar to how fts_parse_sql() appears to work.

      Attachments

        Issue Links

          Activity

            The same customer reported a hang with 10.6.16-11. The root cause is rather similar, but this time it does not involve log_free_check() and page writes, but trx_sys.rseg_array and page reads. The buf_flush_page_cleaner thread is idle. The scenario was as follows:

            • purge_truncation_callback() is holding an exclusive latch on an element of trx_sys.rseg_array and waiting for a page to be read into the buffer pool, in flst_remove() invoked from trx_purge_truncate_rseg_history().
            • purge_coordinator_callback() in trx_sys_t::history_size() is holding shared latches on the first elements of trx_sys.rseg_array while waiting for the latch that purge_truncation_callback() is currently holding in exclusive mode.
            • Several transactions are blocked either in trx_undo_report_row_operation() (starting a transaction) or in trx_t::write_serialisation_history(), trying to acquire an exclusive latch on a rollback segment that purge_coordinator_callback() is currently holding a shared latch on.
            • One of the blocked transactions is executing dict_stats_save() while holding exclusive dict_sys.latch. A possible improvement might be to allocate this transaction in the undo logs before acquiring the exclusive dict_sys.latch.
            • srv_master_callback() in dict_sys_t::evict_table_LRU() is waiting for an exclusive dict_sys.latch, apparently for innodb_fatal_semaphore_wait_threshold seconds (default=600, or 10 minutes).

            There would seem to be two possible workarounds for this problem, which appears to be caused by underperforming storage or by an insufficient innodb_buffer_pool_size:

            SET GLOBAL innodb_purge_batch_size=127;
            SET GLOBAL innodb_stats_auto_recalc=OFF;
            

            The former is specific to this scenario (and not the original one that involved log_free_check()), hoping to reduce the chance that the page that purge_truncation_callback() is looking for would be missing from the buffer pool. It is something that was recently found in MDEV-34515.

            The latter would prevent the automatic recalculation of InnoDB persistent statistics and therefore the contention on dict_sys.latch that will ultimately cause the server to be killed by the watchdog. For any tables that were created with the attribute STATS_AUTO_RECALC=1, the automatic recalculation would still happen. As a result of the I/O anomaly, it seems possible that the server would still be able to hang for an extended period of time, but the watchdog might no longer interfere.

            marko Marko Mäkelä added a comment - The same customer reported a hang with 10.6.16-11. The root cause is rather similar, but this time it does not involve log_free_check() and page writes, but trx_sys.rseg_array and page reads. The buf_flush_page_cleaner thread is idle. The scenario was as follows: purge_truncation_callback() is holding an exclusive latch on an element of trx_sys.rseg_array and waiting for a page to be read into the buffer pool, in flst_remove() invoked from trx_purge_truncate_rseg_history() . purge_coordinator_callback() in trx_sys_t::history_size() is holding shared latches on the first elements of trx_sys.rseg_array while waiting for the latch that purge_truncation_callback() is currently holding in exclusive mode. Several transactions are blocked either in trx_undo_report_row_operation() (starting a transaction) or in trx_t::write_serialisation_history() , trying to acquire an exclusive latch on a rollback segment that purge_coordinator_callback() is currently holding a shared latch on. One of the blocked transactions is executing dict_stats_save() while holding exclusive dict_sys.latch . A possible improvement might be to allocate this transaction in the undo logs before acquiring the exclusive dict_sys.latch . srv_master_callback() in dict_sys_t::evict_table_LRU() is waiting for an exclusive dict_sys.latch , apparently for innodb_fatal_semaphore_wait_threshold seconds (default=600, or 10 minutes). There would seem to be two possible workarounds for this problem, which appears to be caused by underperforming storage or by an insufficient innodb_buffer_pool_size : SET GLOBAL innodb_purge_batch_size=127; SET GLOBAL innodb_stats_auto_recalc= OFF ; The former is specific to this scenario (and not the original one that involved log_free_check() ), hoping to reduce the chance that the page that purge_truncation_callback() is looking for would be missing from the buffer pool. It is something that was recently found in MDEV-34515 . The latter would prevent the automatic recalculation of InnoDB persistent statistics and therefore the contention on dict_sys.latch that will ultimately cause the server to be killed by the watchdog. For any tables that were created with the attribute STATS_AUTO_RECALC=1 , the automatic recalculation would still happen. As a result of the I/O anomaly, it seems possible that the server would still be able to hang for an extended period of time, but the watchdog might no longer interfere.

            The Description is about a performance improvement that could help on an I/O constrained system.

            However, the customer support case that prompted me to file this ticket involves an actual hang due to apparently lost page write or read requests. Because the system in question is on MariaDB Server 10.6, the hang could have been fixed in MDEV-33669. That bug could possibly also explain the hang MDEV-34641.

            marko Marko Mäkelä added a comment - The Description is about a performance improvement that could help on an I/O constrained system. However, the customer support case that prompted me to file this ticket involves an actual hang due to apparently lost page write or read requests. Because the system in question is on MariaDB Server 10.6, the hang could have been fixed in MDEV-33669 . That bug could possibly also explain the hang MDEV-34641 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.