Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36863

InnoDB: Failing assertion: !block->n_hash_helps after failing to shrink innodb_buffer_pool_size

    XMLWordPrintable

Details

    • Can result in hang or crash
    • After SET GLOBAL innodb_buffer_pool_size was aborted while attempting to shrink the buffer pool, executing SET GLOBAL innodb_adaptive_hash_index=ON could lead to corruption of the adaptive hash index.

    Description

      The assertion fails because the adaptive hash index fields were not being cleared when the block had been freed while shrinking the buffer pool:

      MDEV-36301 1700b6f5b4462e742dbaaf59266865c863ef1d92

      #0  0x000056b458d2289d in ut_list_remove<ut_list_base<buf_page_t, ut_list_node<buf_page_t> buf_page_t::*>, GenericGetNode<buf_page_t> > (
          list=@0x56b459af4418: {count = 0x80f, start = 0x5fb00d01c500, end = 0x5fb00b8009c0, node = &buf_page_t::LRU, init = 0xcafe}, node=@0x5fb00c802178: {prev = 0x5fb00c028bc0, next = 0x5fb00b8102c0}, 
          get_node={m_node = &buf_page_t::LRU}) at /data/Server/MDEV-36301A/storage/innobase/include/ut0lst.h:354
      #1  0x000056b458d22912 in ut_list_remove<ut_list_base<buf_page_t, ut_list_node<buf_page_t> buf_page_t::*> > (
          list=@0x56b459af4418: {count = 0x80f, start = 0x5fb00d01c500, end = 0x5fb00b8009c0, node = &buf_page_t::LRU, init = 0xcafe}, elem=elem@entry=0x5fb00c802100)
          at /data/Server/MDEV-36301A/storage/innobase/include/ut0lst.h:385
      #2  0x000056b458e6c41b in buf_pool_t::LRU_remove (this=this@entry=0x56b459aefe40 <buf_pool>, bpage=bpage@entry=0x5fb00c802100) at /data/Server/MDEV-36301A/storage/innobase/include/buf0buf.h:1355
      #3  0x000056b458e66cc6 in buf_pool_t::shrink (this=this@entry=0x56b459aefe40 <buf_pool>, size=size@entry=0x800000) at /data/Server/MDEV-36301A/storage/innobase/buf/buf0buf.cc:1759
      #4  0x000056b458e67ac1 in buf_pool_t::resize (this=0x56b459aefe40 <buf_pool>, size=0x800000, thd=0x159554000d58) at /data/Server/MDEV-36301A/storage/innobase/buf/buf0buf.cc:2075
      #5  0x000056b458c5b394 in innodb_buffer_pool_size_update (thd=<optimized out>, save=<optimized out>) at /data/Server/MDEV-36301A/storage/innobase/handler/ha_innodb.cc:3708
      #6  0x000056b45879b556 in sys_var_pluginvar::global_update (this=0x56b45cd19e70, thd=0x159554000d58, var=0x159554015ce8) at /data/Server/MDEV-36301A/sql/sql_plugin.cc:3691
      #7  0x000056b4586b7f0d in sys_var::update (this=0x56b45cd19e70, thd=0x159554000d58, var=0x159554015ce8) at /data/Server/MDEV-36301A/sql/set_var.cc:211
      #8  0x000056b4586b8405 in set_var::update (this=<optimized out>, thd=<optimized out>) at /data/Server/MDEV-36301A/sql/set_var.cc:871
      #9  0x000056b4586b9350 in sql_set_variables (thd=thd@entry=0x159554000d58, var_list=var_list@entry=0x159554006130, free=free@entry=0x1) at /data/Server/MDEV-36301A/sql/set_var.cc:752
      #10 0x000056b45877ef78 in mysql_execute_command (thd=thd@entry=0x159554000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=0x0) at /data/Server/MDEV-36301A/sql/sql_parse.cc:4859
      #11 0x000056b45878269c in mysql_parse (thd=thd@entry=0x159554000d58, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x74b56e5c43a0)
          at /data/Server/MDEV-36301A/sql/sql_parse.cc:7891
      #12 0x000056b458783c7e in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x159554000d58, 
          packet=packet@entry=0x15955400b3d9 " SET GLOBAL innodb_buffer_pool_size = 8388608  /* E_R Thread1 QNO 1997 CON_ID 17 */ ", packet_length=packet_length@entry=0x54, blocking=blocking@entry=0x1)
          at /data/Server/MDEV-36301A/sql/sql_parse.cc:1877
      

      My initial thought was that we should adjust the code around the following:

      diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
      index fd0617ca4ee..0fab4fea1fa 100644
      --- a/storage/innobase/buf/buf0buf.cc
      +++ b/storage/innobase/buf/buf0buf.cc
      @@ -1797,6 +1797,7 @@ ATTRIBUTE_COLD buf_pool_t::shrink_status buf_pool_t::shrink(size_t size)
           ut_d(b->in_LRU_list= false);
       
           b->set_state(buf_page_t::NOT_USED);
      +    // TODO: clear more of b, similar to buf_LRU_free_page()
           UT_LIST_ADD_LAST(withdrawn, b);
           if (!--n_blocks_to_withdraw)
             goto withdraw_done;
      

      I see that we are not clearing buf_page_t::id_ either. That would not be a correctness problem, because the block will have been detached from buf_pool.page_hash, buf_pool.LRU and any other buf_pool lists than buf_pool.withdrawn.

      In buf_pool_t::resize(), we invoke btr_sea::disable(), but we will not touch the adaptive hash index fields in the individual blocks. Those fields basically only matter when the adaptive hash index is being used.

      As far as I can tell, due to this bug, if the adaptive hash index was enabled before shrinking the buffer pool, and re-enabled afterwards, we would end up with a corrupted adaptive hash index. I believe that the problem is some missing error handling:

      diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
      index fd0617ca4ee..9de1cbd01fe 100644
      --- a/storage/innobase/buf/buf0buf.cc
      +++ b/storage/innobase/buf/buf0buf.cc
      @@ -2092,11 +2092,27 @@ ATTRIBUTE_COLD void buf_pool_t::resize(size_t size, THD *thd) noexcept
             ut_d(b->in_free_list= true);
             ut_ad(b->state() == buf_page_t::NOT_USED);
             b->lock.init();
      +#ifdef BTR_CUR_HASH_ADAPT
      +      /* Clear the AHI fields. These were not cleared when we relocated
      +      the block to withdrawn. Had we successfully shrunk the buffer pool,
      +      all this virtual memory would have been zeroed or made unaccessible,
      +      and on a subsequent buffer pool extension it would be zero again. */
      +      buf_block_t *block= reinterpret_cast<buf_block_t*>(b);
      +      block->n_hash_helps= 0;
      +# if defined UNIV_AHI_DEBUG || defined UNIV_DEBUG
      +      block->n_pointers= 0;
      +# endif
      +      block->index= nullptr;
      +#endif
           }
       
           mysql_mutex_unlock(&mutex);
           my_printf_error(ER_WRONG_USAGE, "innodb_buffer_pool_size change aborted",
                           MYF(ME_ERROR_LOG));
      +#ifdef BTR_CUR_HASH_ADAPT
      +    if (ahi_disabled)
      +      btr_search.enable(true);
      +#endif
           mysql_mutex_lock(&LOCK_global_system_variables);
         }
       
      

      Yes, we also failed to re-enable the adaptive hash index if we failed to shrink the buffer pool.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              saahil Saahil Alam
              Marko Mäkelä Marko Mäkelä
              Debarun Banerjee Debarun Banerjee (Inactive)
              Saahil Alam Saahil Alam
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.