Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23973

Change buffer corruption when reallocating an recently freed page

    XMLWordPrintable

    Details

      Description

      After my follow-up fix to MDEV-19514 to prevent a potential hang, we got an assertion failure:

      10.5 a0113683d7a848be9d0403393c7b5c478cd813a6

      storage/innobase/ibuf/ibuf0ibuf.cc:4200: void ibuf_merge_or_delete_for_page(buf_block_t*, page_id_t, ulint, bool): Assertion `!block || block->page.status == buf_page_t::NORMAL' failed.
      

      In the rr replay trace that I analyzed, the page is being marked free and reallocated again, with the same buf_block_t pointer value and the same page number.

      Here are some traces from a slightly different branch:

      Thread 2 hit Hardware watchpoint 1: *(unsigned*)$2
       
      Old value = 0
      New value = 2
      buf_page_free (page_id={m_id = 93953492658176}, mtr=mtr@entry=0x67ef6ff8df40, 
          file=file@entry=0x557340885d80 "/home/mleich/Server/bb-10.5-release/storage/innobase/btr/btr0btr.cc", line=line@entry=730)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/buf/buf0buf.cc:2629
      2629	  buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
      1: /x $b->id_ = {m_id = 0x60000001b}
      (rr) 
      Continuing.
      [Switching to Thread 1978612.1980948]
       
      Thread 3 hit Hardware watchpoint 1: *(unsigned*)$2
       
      Old value = 2
      New value = 1
      mtr_t::init (this=0x768212a223f0, b=0x56f24516d7c8)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/include/mtr0log.h:528
      528	  if (m_log_mode != MTR_LOG_ALL)
      1: /x $b->id_ = {m_id = 0x60000001b}
      (rr) bt
      #0  mtr_t::init (this=0x768212a223f0, b=0x56f24516d7c8)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/include/mtr0log.h:528
      #1  0x000055733fb89e47 in fsp_init_file_page (mtr=0x768212a223f0, block=0x56f24516d7c8, 
          space=0x615000030518) at /home/mleich/Server/bb-10.5-release/storage/innobase/include/fsp0fsp.h:575
      #2  fsp_page_create (space=space@entry=0x615000030518, offset=<optimized out>, 
          mtr=mtr@entry=0x768212a223f0)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/fsp/fsp0fsp.cc:1052
      #3  0x000055733fb8a751 in fsp_alloc_free_page (space=space@entry=0x615000030518, hint=<optimized out>, 
          hint@entry=17, mtr=mtr@entry=0x768212a223f0, init_mtr=init_mtr@entry=0x768212a223f0)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/fsp/fsp0fsp.cc:1162
      #4  0x000055733fb8d68d in fseg_alloc_free_page_low (space=space@entry=0x615000030518, 
          seg_inode=seg_inode@entry=0x56f245514872 "", iblock=<optimized out>, hint=hint@entry=17, 
          direction=direction@entry=111 'o', has_done_reservation=has_done_reservation@entry=true, 
          mtr=<optimized out>, init_mtr=<optimized out>)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/fsp/fsp0fsp.cc:2083
      #5  0x000055733fb8fe9b in fseg_alloc_free_page_general (seg_header=<optimized out>, hint=hint@entry=17, 
          direction=direction@entry=111 'o', has_done_reservation=has_done_reservation@entry=true, 
          mtr=mtr@entry=0x768212a223f0, init_mtr=init_mtr@entry=0x768212a223f0)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/fsp/fsp0fsp.cc:2214
      #6  0x000055733f898cbf in btr_page_alloc_low (index=index@entry=0x6170000cfe20, 
          hint_page_no=hint_page_no@entry=17, file_direction=file_direction@entry=111 'o', 
          level=level@entry=0, mtr=mtr@entry=0x768212a223f0, init_mtr=init_mtr@entry=0x768212a223f0)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/btr/btr0btr.cc:536
      #7  0x000055733f899472 in btr_page_alloc (index=0x6170000cfe20, hint_page_no=hint_page_no@entry=17, 
          file_direction=file_direction@entry=111 'o', level=level@entry=0, mtr=mtr@entry=0x768212a223f0, 
          init_mtr=init_mtr@entry=0x768212a223f0)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/btr/btr0btr.cc:569
      ...
      (rr) c
      Continuing.
       
      Thread 3 hit Hardware watchpoint 2: -location b.page.ibuf_exist
       
      Old value = true
      New value = false
      buf_page_get_low (page_id={m_id = 1297037059688560024}, zip_size=zip_size@entry=0, 
          rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=11, 
          file=file@entry=0x5573407bdb00 "/home/mleich/Server/bb-10.5-release/storage/innobase/row/row0ins.cc", line=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/buf/buf0buf.cc:3494
      3494				ibuf_merge_or_delete_for_page(fix_block, page_id,
      1: /x $b->id_ = {m_id = 0x60000001b}
      (rr) c
      Continuing.
      ...
      #4  0x000055733f37f504 in ibuf_merge_or_delete_for_page (block=block@entry=0x56f24516d7c8, page_id=
            {m_id = 25769803803}, zip_size=zip_size@entry=0, update_ibuf_bitmap=update_ibuf_bitmap@entry=true)
          at /home/mleich/Server/bb-10.5-release/storage/innobase/ibuf/ibuf0ibuf.cc:4200
      4200		ut_ad(!block || block->page.status == buf_page_t::NORMAL);
      

      We are hitting an assertion that was added in MDEV-15528. The probability was improved due to my MDEV-19514 follow-up fix. We are now setting ibuf_exist=true more often, causing more frequent calls to the function.

      The problem appears to be that the thread that ends up calling of mtr_t::init() will neither reset the ibuf_exist flag nor invoke ibuf_merge_or_delete_for_page(nullptr, ...) to discard any previously buffered changes. The buffered changes were always being deleted lazily, if you look at buf_page_create():

        /* Delete possible entries for the page from the insert buffer:
        such can exist if the page belonged to an index which was dropped */
        if (!recv_recovery_is_on())
          ibuf_merge_or_delete_for_page(nullptr, page_id, zip_size, true);
      

      In fact, it looks like the early return block in buf_page_create() is avoiding the above call! If the page had been evicted from the buffer pool, we would correctly have called the code.

      The failure probability could have been improved further by MDEV-23399, which reduced the probability that pages will be evicted from the buffer pool.

      Thirunarayanan Balathandayuthapani, can you please try to fix this, and check whether earlier versions could be affected too?

      I am leaning towards to believing that this is a regression that was introduced by MDEV-19514, because before MDEV-19514 buf_page_get_gen() should never return a page for which unbuffered changes exist.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: