Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39425

Potential hang in the recovery of ROW_FORMAT=COMPRESSED tables

    XMLWordPrintable

Details

    • Can result in hang or crash

    Description

      While developing MDEV-37949, I was able to reproduce a recovery hang related to a ROW_FORMAT=COMPRESSED table. InnoDB was running out of buffer pool. In MDEV-29911 the memory management was refactored in such a way that we preallocate one block to guarantee working recovery. This works for normal tables. However, for ROW_FORMAT=COMPRESSED tables we currently allocate two blocks: an uncompressed block and a compressed one. While we are executing the following, we are holding exclusive log_sys.latch:

      buf_pool_t::LRU_warn
      buf_LRU_get_free_block
      buf_buddy_alloc_low
      buf_buddy_alloc
      buf_page_create_low
      buf_page_create_deferred
      recv_sys_t::recover_low
      recv_sys_t::recover_deferred
      recv_sys_t::apply_batch
      recv_sys_t::apply
      

      I was able to revise the MDEV-37949 so that buf_flush_sync_for_checkpoint() we will avoid acquiring log_sys.latch when recovery is in progress.

      It would be better and safer to revise the allocation in such a way that buf_page_create_deferred() will not have to wait for a free block. After all, due to MDEV-36226 we have another potential hang with the 10.11+ version of buf_pool_t::print_flush_info(), which would be alleviated by the following:

      diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
      index 911ff19c6f3..b5ad7171127 100644
      --- a/storage/innobase/buf/buf0flu.cc
      +++ b/storage/innobase/buf/buf0flu.cc
      @@ -2989,7 +2989,7 @@ ATTRIBUTE_COLD void buf_pool_t::print_flush_info() const noexcept
           "-------------------",
           lru_size, free_size, dirty_size, dirty_pct);
       
      -  lsn_t lsn= log_get_lsn();
      +  lsn_t lsn= log_sys.get_lsn_approx();
         lsn_t clsn= log_sys.last_checkpoint_lsn;
         sql_print_information("InnoDB: LSN flush parameters\n"
           "-------------------\n"
      

      The ROW_FORMAT=COMPRESSED recovery is really the bad guy here, as it is holding log_sys.latch while trying to allocate a block from the buffer pool. Normally, in order to be able to write out pages, the buf_flush_page_cleaner() thread must be able to make log writes durable. For that, acquiring log_sys.latch is sometimes necessary.

      Attachments

        Issue Links

          Activity

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.