Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35626

Race condition between buf_page_create_low() and read completion

    XMLWordPrintable

Details

    Description

      The fix of MDEV-34453 removed a read of a block descriptor state in case we are able to acquire an exclusive latch on the page without waiting:

      diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
      index ae604f0826c..ba49754db6a 100644
      --- a/storage/innobase/buf/buf0buf.cc
      +++ b/storage/innobase/buf/buf0buf.cc
      @@ -3322,6 +3322,8 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
               ut_ad(!bpage->is_io_fixed(state));
               ut_ad(bpage->buf_fix_count(state));
             }
      +      else
      +        state= bpage->state();
       
             ut_ad(state >= buf_page_t::FREED);
             ut_ad(state < buf_page_t::READ_FIX);
      

      In an rr replay trace, we have buf_page_t::read_complete() completing between the initial read of bpage->state() (before the above code snippet) and the time the last quoted assertion on failing because state == buf_page_t::READ_FIX + 1.

      As far as I can tell, this bug may cause a totally incorrect state of the block to be set in a subsequent call:

              bpage->set_reinit(state & buf_page_t::LRU_MASK);
      

      Furthermore, there seems to be some duplicated or redundant code that affects ROW_FORMAT=COMPRESSED pages that exist in the buffer pool only in compressed form:

      @@ -3345,24 +3347,12 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
             }
             else
             {
      -        auto state= bpage->state();
      -        ut_ad(state >= buf_page_t::FREED);
      -        ut_ad(state < buf_page_t::READ_FIX);
      -
               page_hash_latch &hash_lock= buf_pool.page_hash.lock_get(chain);
               /* It does not make sense to use transactional_lock_guard here,
               because buf_relocate() would likely make the memory transaction
               too large. */
               hash_lock.lock();
       
      -        if (state < buf_page_t::UNFIXED)
      -          bpage->set_reinit(buf_page_t::FREED);
      -        else
      -        {
      -          bpage->set_reinit(state & buf_page_t::LRU_MASK);
      -          ibuf_exist= (state & buf_page_t::LRU_MASK) == buf_page_t::IBUF_EXIST;
      -        }
      -
               mysql_mutex_lock(&buf_pool.flush_list_mutex);
               buf_relocate(bpage, &free_block->page);
               free_block->page.lock.x_lock();
      

      As far as I can tell, this code should not have had any ill effect. The else branch should always be taken because a state of FREED would already have been transformed to UNFIXED.

      Attachments

        Issue Links

          Activity

            People

              debarun Debarun Banerjee
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.