Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31390

Assertion id.is_corrupted() failed in buf_page_create_low()

Details

    Description

      MDEV-27058 introduced a race condition in the function buf_page_create_low(). A patch illustrates it:

      diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
      index 90886173b1b..3afd3840552 100644
      --- a/storage/innobase/buf/buf0buf.cc
      +++ b/storage/innobase/buf/buf0buf.cc
      @@ -3144,7 +3144,7 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
         free_block->initialise(page_id, zip_size, buf_page_t::MEMORY);
       
         buf_pool_t::hash_chain &chain= buf_pool.page_hash.cell_get(page_id.fold());
      -retry:
      +
         mysql_mutex_lock(&buf_pool.mutex);
       
         buf_page_t *bpage= buf_pool.page_hash.get(page_id, chain);
      @@ -3159,21 +3159,16 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
           if (!mtr->have_x_latch(reinterpret_cast<const buf_block_t&>(*bpage)))
           {
             const bool got= bpage->lock.x_lock_try();
      +      auto state= bpage->fix();
             if (!got)
             {
               mysql_mutex_unlock(&buf_pool.mutex);
               bpage->lock.x_lock();
      -        const page_id_t id{bpage->id()};
      -        if (UNIV_UNLIKELY(id != page_id))
      -        {
      -          ut_ad(id.is_corrupted());
      -          bpage->lock.x_unlock();
      -          goto retry;
      -        }
      +        ut_ad(page_id == bpage->id());
      +        state= bpage->state();
               mysql_mutex_lock(&buf_pool.mutex);
             }
       
      -      auto state= bpage->fix();
             ut_ad(state >= buf_page_t::FREED);
             ut_ad(state < buf_page_t::READ_FIX);
       
      

      The above code is trying to acquire the page latch while holding the buf_pool.mutex. This would violate the latching order if we didn't use a non-blocking wait.

      If the above patch was not applied, the page could be evicted and replaced with something else in the buffer pool while this thread is not holding either buf_pool.mutex or the page latch. To make this safe, we must first buffer-fix the block so that buf_page_t::can_relocate() will not hold, and then release the buf_pool.mutex.

      The observed symptom was a debug assertion failure that was caught by mleich:

      mysqld: /data/Server/bb-10.6-MDEV-30986B/storage/innobase/buf/buf0buf.cc:3155: buf_block_t* buf_page_create_low(page_id_t, ulint, mtr_t*, buf_block_t*): Assertion `id.is_corrupted()' failed
      

      In the core dump, we had id.m_id==0x1c00000070 but page_id.m_id==0x9b00000007. In the stack trace, I could see that we were trying to allocate a block for tablespace 0x9b.

      I believe that this could explain MDEV-30531.

      I will check if we are missing the buffer-fix in any other places that follow a similar pattern.

      Attachments

        Issue Links

          Activity

            I realized that the debug assertion failure cannot lead to any corruption, because in the code path where it is executed, we would goto retry and proceed look up the page again.

            The patch that I posted will merely simplify code and slightly improve performance.

            I checked the code base, and any other use of trying to acquire page latches without waiting looks safe.

            marko Marko Mäkelä added a comment - I realized that the debug assertion failure cannot lead to any corruption, because in the code path where it is executed, we would goto retry and proceed look up the page again. The patch that I posted will merely simplify code and slightly improve performance. I checked the code base, and any other use of trying to acquire page latches without waiting looks safe.

            In fact, the patch that I posted in the Description would trigger an assertion failure in the test innodb.innodb-wl5522-debug. We could try a simpler fix that should only invoke goto retry when the block had been read-fixed and found to be corrupted:

            diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
            index 90886173b1b..a44bcc90575 100644
            --- a/storage/innobase/buf/buf0buf.cc
            +++ b/storage/innobase/buf/buf0buf.cc
            @@ -3159,6 +3159,7 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
                 if (!mtr->have_x_latch(reinterpret_cast<const buf_block_t&>(*bpage)))
                 {
                   const bool got= bpage->lock.x_lock_try();
            +      auto state= bpage->fix();
                   if (!got)
                   {
                     mysql_mutex_unlock(&buf_pool.mutex);
            @@ -3170,10 +3171,10 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
                       bpage->lock.x_unlock();
                       goto retry;
                     }
            +        state= bpage->state();
                     mysql_mutex_lock(&buf_pool.mutex);
                   }
             
            -      auto state= bpage->fix();
                   ut_ad(state >= buf_page_t::FREED);
                   ut_ad(state < buf_page_t::READ_FIX);
             
            

            However, this patch will actually make the same test hang. The page read completion thread would wait forever for the buffer-fix to be released so that the corrupted page can be evicted:

            #0  0x000055884fdf2ea9 in LF_BACKOFF () at /mariadb/10.6/include/my_cpu.h:124
            #1  buf_pool_t::corrupted_evict (this=this@entry=0x55885030ce80 <buf_pool>, bpage=bpage@entry=0x7f3d852d71a0, state=state@entry=2147483648) at /mariadb/10.6/storage/innobase/buf/buf0lru.cc:1249
            #2  0x000055884fdaea9b in buf_page_t::read_complete (this=0x7f3d852d71a0, node=@0x7f3d300873b0: {space = 0x7f3d3025c2b0, name = 0x7f3d300daed0 "./test/t1.ibd", handle = {m_file = 50}, on_ssd = 0, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 576, init_size = 4, max_size = 4294967295, being_extended = {m = std::atomic<bool> = { false }}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at /mariadb/10.6/storage/innobase/buf/buf0buf.cc:3633
            #3  0x000055884fdc2c86 in IORequest::read_complete (this=this@entry=0x5588526fae58) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2862
            

            The best alternative would seem to be to simply remove the failing debug assertion. Before we do that, it would be good to produce a simplified grammar for reproducing the debug assertion failure.

            marko Marko Mäkelä added a comment - In fact, the patch that I posted in the Description would trigger an assertion failure in the test innodb.innodb-wl5522-debug . We could try a simpler fix that should only invoke goto retry when the block had been read-fixed and found to be corrupted: diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc index 90886173b1b..a44bcc90575 100644 --- a/storage/innobase/buf/buf0buf.cc +++ b/storage/innobase/buf/buf0buf.cc @@ -3159,6 +3159,7 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size, if (!mtr->have_x_latch(reinterpret_cast<const buf_block_t&>(*bpage))) { const bool got= bpage->lock.x_lock_try(); + auto state= bpage->fix(); if (!got) { mysql_mutex_unlock(&buf_pool.mutex); @@ -3170,10 +3171,10 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size, bpage->lock.x_unlock(); goto retry; } + state= bpage->state(); mysql_mutex_lock(&buf_pool.mutex); } - auto state= bpage->fix(); ut_ad(state >= buf_page_t::FREED); ut_ad(state < buf_page_t::READ_FIX); However, this patch will actually make the same test hang. The page read completion thread would wait forever for the buffer-fix to be released so that the corrupted page can be evicted: #0 0x000055884fdf2ea9 in LF_BACKOFF () at /mariadb/10.6/include/my_cpu.h:124 #1 buf_pool_t::corrupted_evict (this=this@entry=0x55885030ce80 <buf_pool>, bpage=bpage@entry=0x7f3d852d71a0, state=state@entry=2147483648) at /mariadb/10.6/storage/innobase/buf/buf0lru.cc:1249 #2 0x000055884fdaea9b in buf_page_t::read_complete (this=0x7f3d852d71a0, node=@0x7f3d300873b0: {space = 0x7f3d3025c2b0, name = 0x7f3d300daed0 "./test/t1.ibd", handle = {m_file = 50}, on_ssd = 0, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 576, init_size = 4, max_size = 4294967295, being_extended = {m = std::atomic<bool> = { false }}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at /mariadb/10.6/storage/innobase/buf/buf0buf.cc:3633 #3 0x000055884fdc2c86 in IORequest::read_complete (this=this@entry=0x5588526fae58) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2862 The best alternative would seem to be to simply remove the failing debug assertion. Before we do that, it would be good to produce a simplified grammar for reproducing the debug assertion failure.

            People

              mleich Matthias Leich
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.