Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39780

Server crash in INFORMATION_SCHEMA.INNODB_BUFFER_PAGE after innodb_buffer_pool_size change

    XMLWordPrintable

Details

    • Can result in hang or crash

    Description

      mleich produced an rr replay trace where MariaDB Server 10.6 is crashing after executing the following commands:

      SET GLOBAL innodb_buffer_pool_size=268435456;
      SET GLOBAL innodb_buffer_pool_size=134217728;
      SELECT * FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE;
      

      10.6-MDEV-39344 cb6b5e26d1837c007b87bd4ad676b7c1720ffe8e

      2026-05-27 12:55:11 17 [Note] InnoDB: Requested to resize buffer pool. (new size: 134217728 bytes)
      2026-05-27 12:55:11 0 [Note] InnoDB: Resizing buffer pool from 268435456 to 134217728 (unit=134217728).
      2026-05-27 12:55:11 0 [Note] InnoDB: Disabling adaptive hash index.
      2026-05-27 12:55:11 0 [Note] InnoDB: Withdrawing blocks to be shrunken.
      2026-05-27 12:55:11 0 [Note] InnoDB: start to withdraw the last 15977 blocks
      2026-05-27 12:55:11 0 [Note] InnoDB: withdrawing blocks. (15977/15977)
      2026-05-27 12:55:11 0 [Note] InnoDB: withdrew 15977 blocks from free list. Tried to relocate 0 pages (15977/15977)
      2026-05-27 12:55:11 0 [Note] InnoDB: withdrawn target: 15977 blocks
      2026-05-27 12:55:11 0 [Note] InnoDB: Latching whole of buffer pool.
      2026-05-27 12:55:11 0 [Note] InnoDB: buffer pool resizing with chunks 2 to 1.
      2026-05-27 12:55:11 0 [Note] InnoDB: 1 chunks (15977 blocks) were freed.
      2026-05-27 12:55:11 0 [Note] InnoDB: Completed to resize buffer pool from 268435456 to 134217728.
      2026-05-27 12:55:11 0 [Note] InnoDB: Completed resizing buffer pool at 260527 12:55:11.
      [New Thread 1857251.1870065]
      [New Thread 1857251.1857468]
      [New Thread 1857251.1917025]
      [New Thread 1857251.1946944]
       
      Thread 10 received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 1857251.1870065]
      i_s_innodb_buffer_page_get_info (bpage=bpage@entry=0x31e84fa9ad00, 
          pos=pos@entry=10000, page_info=page_info@entry=0x6d1108077010)
          at /data/Server/10.6-MDEV-39344/storage/innobase/handler/i_s.cc:4070
      4070		page_info->state = bpage->state();
      (rr) awatch *(size_t*)0x31e84fa9ad00
      Hardware access (read/write) watchpoint 1: *(size_t*)0x31e84fa9ad00
      (rr) rc
      Continuing.
       
      Thread 50 hit Hardware access (read/write) watchpoint 1: *(size_t*)0x31e84fa9ad00
       
      Value = 18446744073709551615
      0x0000000070000000 in syscall_traced ()
      (rr) bt
      #0  0x0000000070000000 in syscall_traced ()
      #1  0x00006298fbba1018 in _raw_syscall () at /home/ubuntu/rr/src/preload/raw_syscall.S:120
      #2  0x00006298fbb9a909 in traced_raw_syscall (call=0x7f64a4dfffa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:379
      #3  0x00006298fbb9e0bc in sys_futex (call=<optimized out>) at /home/ubuntu/rr/src/preload/syscallbuf.c:2085
      #4  syscall_hook_internal (call=call@entry=0x7f64a4dfffa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4203
      #5  0x00006298fbba0da4 in syscall_hook (call=0x7f64a4dfffa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4387
      #6  syscall_hook (call=0x7f64a4dfffa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4371
      #7  0x00006298fbb9a323 in _syscall_hook_trampoline () at /home/ubuntu/rr/src/preload/syscall_hook.S:308
      #8  0x00006298fbb9a38d in __morestack () at /home/ubuntu/rr/src/preload/syscall_hook.S:443
      #9  0x00006298fbb9a394 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /home/ubuntu/rr/src/preload/syscall_hook.S:457
      #10 0x0000048130297d81 in __GI_munmap () at ../sysdeps/unix/syscall-template.S:117
      #11 0x00006298fcbf8cdb in my_large_free (ptr=0x31e84f89f000, size=134217728) at /data/Server/10.6-MDEV-39344/mysys/my_largepage.c:430
      #12 0x00006298fc9f60a8 in ut_allocator<unsigned char, true>::deallocate_large (this=this@entry=0x6298fd743c60 <buf_pool+16672>, ptr=<optimized out>, pfx=pfx@entry=0x62990f6dca80)
          at /data/Server/10.6-MDEV-39344/storage/innobase/include/ut0new.h:675
      #13 0x00006298fcacf879 in ut_allocator<unsigned char, true>::deallocate_large_dodump (pfx=0x62990f6dca80, ptr=<optimized out>, this=0x6298fd743c60 <buf_pool+16672>)
          at /data/Server/10.6-MDEV-39344/storage/innobase/include/ut0new.h:679
      #14 buf_pool_t::resize (this=this@entry=0x6298fd73fb40 <buf_pool>) at /data/Server/10.6-MDEV-39344/storage/innobase/buf/buf0buf.cc:1747
      #15 0x00006298fcac8940 in buf_resize_callback () at /data/Server/10.6-MDEV-39344/storage/innobase/buf/buf0buf.cc:1935
      #16 0x00006298fcb873ed in tpool::task_group::execute (this=0x6298fe04e6c0 <single_threaded_group>, t=t@entry=0x6298fe04e620 <buf_resize_task>) at /data/Server/10.6-MDEV-39344/tpool/task_group.cc:55
      #17 0x00006298fcb87489 in tpool::task::execute (this=0x6298fe04e620 <buf_resize_task>) at /data/Server/10.6-MDEV-39344/tpool/task.cc:32
      #18 0x00006298fcb84bc9 in tpool::thread_pool_generic::worker_main (this=0x62990f68b120, thread_var=0x62990f68b7b0) at /data/Server/10.6-MDEV-39344/tpool/tpool_generic.cc:573
      

      We can see that the view information_schema.innodb_buffer_page is accessing a block descriptor that had been freed as part of shrinking the buffer pool.

      When MDEV-29445 reimplemented the buffer pool resizing and the way how the buffer pool is allocated, it created a static mapping between page frame addresses and block descriptors. The virtual addresses are preallocated at server startup. Buffer pool resizing would only change how many addresses starting from buf_pool.memory are addressable.

      Let us look at the crash to see if it would be possible when the MDEV-29445 fix is present:

      10.6-MDEV-39344 cb6b5e26d1837c007b87bd4ad676b7c1720ffe8e

      (rr) bt
      #0  i_s_innodb_buffer_page_get_info (bpage=bpage@entry=0x31e84fa9ad00, pos=pos@entry=10000, page_info=page_info@entry=0x6d1108077010) at /data/Server/10.6-MDEV-39344/storage/innobase/handler/i_s.cc:4070
      #1  0x00006298fc915497 in i_s_innodb_buffer_page_fill (thd=0x6d1108000d58, tables=0x6d1108013d00) at /data/Server/10.6-MDEV-39344/storage/innobase/handler/i_s.cc:4179
      #2  0x00006298fc47af3c in get_schema_tables_result (join=join@entry=0x6d1108015728, executed_place=executed_place@entry=PROCESSED_BY_JOIN_EXEC) at /data/Server/10.6-MDEV-39344/sql/sql_show.cc:9056
      

      The culprit is clear:

      i_s_innodb_buffer_page_fill()

      	for (ulint n = 0;
      	     n < ut_min(buf_pool.n_chunks, buf_pool.n_chunks_new); n++) {
      // skip some code
      			/* Obtain appropriate mutexes. Since this is diagnostic
      			buffer pool info printout, we are not required to
      			preserve the overall consistency, so we can
      			release mutex periodically */
      			mysql_mutex_lock(&buf_pool.mutex);
       
      			/* GO through each block in the chunk */
      			for (n_blocks = num_to_process; n_blocks--; block++) {
      				i_s_innodb_buffer_page_get_info(
      					&block->page, block_id,
      					info_buffer + num_page);
      

      This function is reading many structures that ought to be protected by buf_pool.mutex. In fact, this code was writing output to an ENGINE=Aria internal temporary table while the memory buffer was freed:

      10.6-MDEV-39344 cb6b5e26d1837c007b87bd4ad676b7c1720ffe8e

      #11 0x00006298fc471821 in schema_table_store_record (thd=thd@entry=0x6d1108000d58, table=table@entry=0x6d110806ef80) at /data/Server/10.6-MDEV-39344/sql/sql_show.cc:3942
      #12 0x00006298fc90fc22 in i_s_innodb_buffer_page_fill (thd=thd@entry=0x6d1108000d58, tables=tables@entry=0x6d1108013d00, info_array=info_array@entry=0x6d1108077010, num_page=num_page@entry=10000) at /data/Server/10.6-MDEV-39344/storage/innobase/handler/i_s.cc:3987
      #13 0x00006298fc9154e6 in i_s_innodb_buffer_page_fill (thd=0x6d1108000d58, tables=0x6d1108013d00) at /data/Server/10.6-MDEV-39344/storage/innobase/handler/i_s.cc:4190
      #14 0x00006298fc47af3c in get_schema_tables_result (join=join@entry=0x6d1108015728, executed_place=executed_place@entry=PROCESSED_BY_JOIN_EXEC) at /data/Server/10.6-MDEV-39344/sql/sql_show.cc:9056
      

      In MDEV-29445 the loop was rewritten to be much shorter, and with proper mutex protection:

        for (size_t j= 0;;)
        {
          memset((void*) b, 0, MAX_BUF_INFO_CACHED * sizeof *b);
          mysql_mutex_lock(&buf_pool.mutex);
          const size_t N= buf_pool.curr_size();
          const size_t n= std::min<size_t>(N, MAX_BUF_INFO_CACHED);
          for (size_t i= 0; i < n && j < N; i++, j++)
            i_s_innodb_buffer_page_get_info(&buf_pool.get_nth_page(j)->page, j,
                                            &b[i]);
       
          mysql_mutex_unlock(&buf_pool.mutex);
          status= i_s_innodb_buffer_page_fill(thd, tables, b, n);
          if (status || j >= N)
            break;
        }
      

      In this loop, we are first safely copying the information to an intermediate array of buf_page_info_t, at most MAX_BUF_INFO_CACHED entries. Then, after releasing buf_pool.mutex, we copy the intermediate array to the output buffer. If the size of the buffer pool is changed before the next iteration, we will correctly keep iterating until the current buffer pool size is reached.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.