Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31390

Assertion id.is_corrupted() failed in buf_page_create_low()

    XMLWordPrintable

Details

    Description

      MDEV-27058 introduced a race condition in the function buf_page_create_low(). A patch illustrates it:

      diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
      index 90886173b1b..3afd3840552 100644
      --- a/storage/innobase/buf/buf0buf.cc
      +++ b/storage/innobase/buf/buf0buf.cc
      @@ -3144,7 +3144,7 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
         free_block->initialise(page_id, zip_size, buf_page_t::MEMORY);
       
         buf_pool_t::hash_chain &chain= buf_pool.page_hash.cell_get(page_id.fold());
      -retry:
      +
         mysql_mutex_lock(&buf_pool.mutex);
       
         buf_page_t *bpage= buf_pool.page_hash.get(page_id, chain);
      @@ -3159,21 +3159,16 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
           if (!mtr->have_x_latch(reinterpret_cast<const buf_block_t&>(*bpage)))
           {
             const bool got= bpage->lock.x_lock_try();
      +      auto state= bpage->fix();
             if (!got)
             {
               mysql_mutex_unlock(&buf_pool.mutex);
               bpage->lock.x_lock();
      -        const page_id_t id{bpage->id()};
      -        if (UNIV_UNLIKELY(id != page_id))
      -        {
      -          ut_ad(id.is_corrupted());
      -          bpage->lock.x_unlock();
      -          goto retry;
      -        }
      +        ut_ad(page_id == bpage->id());
      +        state= bpage->state();
               mysql_mutex_lock(&buf_pool.mutex);
             }
       
      -      auto state= bpage->fix();
             ut_ad(state >= buf_page_t::FREED);
             ut_ad(state < buf_page_t::READ_FIX);
       
      

      The above code is trying to acquire the page latch while holding the buf_pool.mutex. This would violate the latching order if we didn't use a non-blocking wait.

      If the above patch was not applied, the page could be evicted and replaced with something else in the buffer pool while this thread is not holding either buf_pool.mutex or the page latch. To make this safe, we must first buffer-fix the block so that buf_page_t::can_relocate() will not hold, and then release the buf_pool.mutex.

      The observed symptom was a debug assertion failure that was caught by mleich:

      mysqld: /data/Server/bb-10.6-MDEV-30986B/storage/innobase/buf/buf0buf.cc:3155: buf_block_t* buf_page_create_low(page_id_t, ulint, mtr_t*, buf_block_t*): Assertion `id.is_corrupted()' failed
      

      In the core dump, we had id.m_id==0x1c00000070 but page_id.m_id==0x9b00000007. In the stack trace, I could see that we were trying to allocate a block for tablespace 0x9b.

      I believe that this could explain MDEV-30531.

      I will check if we are missing the buffer-fix in any other places that follow a similar pattern.

      Attachments

        Issue Links

          Activity

            People

              mleich Matthias Leich
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.