Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32588

InnoDB may hang when running out of buffer pool

Details

    Description

      While running performance tests with a small buffer pool, I encountered an anomaly where InnoDB would hang because of running out of buffer pool. There would be some actually clean blocks at buf_pool.flush_list.start, but these would be skipped by buf_flush_LRU_list_batch(). The buf_pool.flush_list.end is being trimmed by periodic calls to buf_pool.get_oldest_modification(). I suspect that the entire buf_pool.LRU would be skipped because of being buffer-fixed, latched, or registered in buf_pool.flush_list due to the MDEV-26010 optimization.

      This seems to be a 10.6 regression due to MDEV-26827.

      MDEV-32050 would make this problem worse by allowing the purge_coordinator_task to buffer-fix a large number of pages.

      Attachments

        Issue Links

          Activity

            I think that the following patch fixes this.

            diff a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
            --- a/storage/innobase/buf/buf0flu.cc
            +++ b/storage/innobase/buf/buf0flu.cc
            @@ -1246,16 +1246,14 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict,
                 ut_ad(state >= buf_page_t::FREED);
                 ut_ad(bpage->in_LRU_list);
             
            -    switch (bpage->oldest_modification()) {
            -    case 0:
            +    if (!bpage->oldest_modification())
            +    {
                 evict:
                   if (state != buf_page_t::FREED &&
                       (state >= buf_page_t::READ_FIX || (~buf_page_t::LRU_MASK & state)))
                     continue;
                   buf_LRU_free_page(bpage, true);
                   ++n->evicted;
            -      /* fall through */
            -    case 1:
                   if (UNIV_LIKELY(scanned & 31))
                     continue;
                   mysql_mutex_unlock(&buf_pool.mutex);
            @@ -1271,7 +1269,11 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict,
                   switch (bpage->oldest_modification()) {
                   case 1:
                     mysql_mutex_lock(&buf_pool.flush_list_mutex);
            -        buf_pool.delete_from_flush_list(bpage);
            +        if (ut_d(lsn_t lsn=) bpage->oldest_modification())
            +        {
            +          ut_ad(lsn == 1); /* It must be clean while we hold bpage->lock */
            +          buf_pool.delete_from_flush_list(bpage);
            +        }
                     mysql_mutex_unlock(&buf_pool.flush_list_mutex);
                     /* fall through */
                   case 0:
            

            Before MDEV-26827, we were acting upon oldest_modification==1 while already holding buf_pool.flush_list_mutex. Both regressions were introduced by me in MDEV-26827.

            marko Marko Mäkelä added a comment - I think that the following patch fixes this. diff a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc --- a/storage/innobase/buf/buf0flu.cc +++ b/storage/innobase/buf/buf0flu.cc @@ -1246,16 +1246,14 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict, ut_ad(state >= buf_page_t::FREED); ut_ad(bpage->in_LRU_list); - switch (bpage->oldest_modification()) { - case 0: + if (!bpage->oldest_modification()) + { evict: if (state != buf_page_t::FREED && (state >= buf_page_t::READ_FIX || (~buf_page_t::LRU_MASK & state))) continue; buf_LRU_free_page(bpage, true); ++n->evicted; - /* fall through */ - case 1: if (UNIV_LIKELY(scanned & 31)) continue; mysql_mutex_unlock(&buf_pool.mutex); @@ -1271,7 +1269,11 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict, switch (bpage->oldest_modification()) { case 1: mysql_mutex_lock(&buf_pool.flush_list_mutex); - buf_pool.delete_from_flush_list(bpage); + if (ut_d(lsn_t lsn=) bpage->oldest_modification()) + { + ut_ad(lsn == 1); /* It must be clean while we hold bpage->lock */ + buf_pool.delete_from_flush_list(bpage); + } mysql_mutex_unlock(&buf_pool.flush_list_mutex); /* fall through */ case 0: Before MDEV-26827 , we were acting upon oldest_modification==1 while already holding buf_pool.flush_list_mutex . Both regressions were introduced by me in MDEV-26827 .

            In my tests on a non-debug build, the race condition that is fixed by the second hunk of the patch caused a shutdown hang as well as a

            InnoDB: Failing assertion: list.count > 0
            

            in buf_pool_t::insert_into_flush_list() during a mtr_t::commit().

            marko Marko Mäkelä added a comment - In my tests on a non-debug build, the race condition that is fixed by the second hunk of the patch caused a shutdown hang as well as a InnoDB: Failing assertion: list.count > 0 in buf_pool_t::insert_into_flush_list() during a mtr_t::commit() .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.