Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34855

Bootstrap hangs while shrinking the system tablespace

Details

    Description

      InnoDB bootstrap hangs while shrinking the system tablespace during slow shutdown.

      Thread-8
      =========
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x00005627c929646d in srw_mutex_impl<true>::wait (this=0x7fc928008b08, lk=2147483650) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:244
      #2  0x00005627c9295a41 in srw_mutex_impl<true>::wait_and_lock (this=0x7fc928008b08) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:331
      #3  0x00005627c901fb6f in srw_mutex_impl<true>::wr_lock (this=0x7fc928008b08) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:134
      #4  0x00005627c914966e in ssux_lock_impl<true>::u_lock (this=0x7fc928008b08) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:250
      #5  0x00005627c91472d2 in sux_lock<ssux_lock_impl<true> >::u_lock (this=0x7fc928008b08) at /data/Server/enterprise_10.6_patched/storage/innobase/include/sux_lock.h:378
      #6  0x00005627c93609fe in buf_page_get_low (page_id=..., zip_size=0, rw_latch=4, guess=0x0, mode=16, mtr=0x7fc8fefdc6c0, err=0x0, allow_ibuf_merge=false, no_wait=0x0)
          at /data/Server/enterprise_10.6_patched/storage/innobase/buf/buf0buf.cc:2895
      #7  0x00005627c9360ecd in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=4, guess=0x0, mode=16, mtr=0x7fc8fefdc6c0, err=0x0, allow_ibuf_merge=false, no_wait=0x0)
          at /data/Server/enterprise_10.6_patched/storage/innobase/buf/buf0buf.cc:2958
      #8  0x00005627c941b119 in xdes_get_descriptor (space=0x5627caf42310, offset=300, mtr=0x7fc8fefdc6c0, err=0x0, xdes=0x0) at /data/Server/enterprise_10.6_patched/storage/innobase/fsp/fsp0fsp.cc:407
      #9  0x00005627c9423014 in fseg_free_step (header=0x7fc9282d003c "", mtr=0x7fc8fefdc6c0, ahi=false) at /data/Server/enterprise_10.6_patched/storage/innobase/fsp/fsp0fsp.cc:2841
      #10 0x00005627c929cea7 in trx_purge_free_segment (rseg_hdr=0x7fc928008950, block=0x7fc928001ee0, mtr=...) at /data/Server/enterprise_10.6_patched/storage/innobase/trx/trx0purge.cc:316
      #11 0x00005627c92a2a37 in purge_sys_t::iterator::free_history_rseg (this=0x7fc8fefdcad0, rseg=...) at /data/Server/enterprise_10.6_patched/storage/innobase/trx/trx0purge.cc:489
      #12 0x00005627c929d18a in purge_sys_t::iterator::free_history (this=0x7fc8fefdcad0) at /data/Server/enterprise_10.6_patched/storage/innobase/trx/trx0purge.cc:551
      #13 0x00005627c928398b in purge_truncation_callback () at /data/Server/enterprise_10.6_patched/storage/innobase/srv/srv0srv.cc:1271
      #14 0x00005627c94a2c82 in tpool::task_group::execute (this=0x5627caca55a0 <purge_truncation_task_group>, t=0x5627caca5660 <purge_truncation_task>)
          at /data/Server/enterprise_10.6_patched/tpool/task_group.cc:55
       
      Thread-3
      ========
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x00005627c9296421 in srw_mutex_impl<false>::wait (this=0x5627ca255f48 <trx_sys+25160>, lk=2147483650) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:244
      #2  0x00005627c9295737 in srw_mutex_impl<false>::wait_and_lock (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:331
      #3  0x00005627c902079b in srw_mutex_impl<false>::wr_lock (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:134
      #4  0x00005627c9296256 in ssux_lock_impl<false>::rd_wait (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:397
      #5  0x00005627c91eecff in ssux_lock_impl<false>::rd_lock (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:247
      #6  0x00005627c9294f21 in srw_lock_debug::rd_lock (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:623
      #7  0x00005627c92ca2af in trx_sys_t::history_size (this=0x5627ca24fd00 <trx_sys>) at /data/Server/enterprise_10.6_patched/storage/innobase/trx/trx0sys.cc:163
      #8  0x00005627c9286134 in purge_coordinator_state::do_purge (this=0x5627caca5320 <purge_state>) at /data/Server/enterprise_10.6_patched/storage/innobase/srv/srv0srv.cc:1627
      #9  0x00005627c928530c in purge_coordinator_callback () at /data/Server/enterprise_10.6_patched/storage/innobase/srv/srv0srv.cc:1716
      #10 0x00005627c94a2c82 in tpool::task_group::execute (this=0x5627caca5440 <purge_coordinator_task_group>, t=0x5627caca5500 <purge_coordinator_task>)
          at /data/Server/enterprise_10.6_patched/tpool/task_group.cc:55
      #11 0x00005627c94a2fac in tpool::task::execute (this=0x5627caca5500 <purge_coordinator_task>) at /data/Server/enterprise_10.6_patched/tpool/task.cc:32
      #12 0x00005627c949a64f in tpool::thread_pool_generic::worker_main (this=0x5627caebf0d0, thread_var=0x5627caebf6f0) at /data/Server/enterprise_10.6_patched/tpool/tpool_generic.cc:583
       
       
      Thread-1
      ========
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x00005627c929646d in srw_mutex_impl<true>::wait (this=0x7fc928008a38, lk=2147483650) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:244
      #2  0x00005627c9295a41 in srw_mutex_impl<true>::wait_and_lock (this=0x7fc928008a38) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:331
      #3  0x00005627c901fb6f in srw_mutex_impl<true>::wr_lock (this=0x7fc928008a38) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:134
      #4  0x00005627c914966e in ssux_lock_impl<true>::u_lock (this=0x7fc928008a38) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:250
      #5  0x00005627c91472d2 in sux_lock<ssux_lock_impl<true> >::u_lock (this=0x7fc928008a38) at /data/Server/enterprise_10.6_patched/storage/innobase/include/sux_lock.h:378
      #6  0x00005627c93609fe in buf_page_get_low (page_id=..., zip_size=0, rw_latch=4, guess=0x0, mode=10, mtr=0x7ffcf54ddda0, err=0x0, allow_ibuf_merge=false, no_wait=0x0)
          at /data/Server/enterprise_10.6_patched/storage/innobase/buf/buf0buf.cc:2895
      #7  0x00005627c9360ecd in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=4, guess=0x0, mode=10, mtr=0x7ffcf54ddda0, err=0x0, allow_ibuf_merge=false, no_wait=0x0)
          at /data/Server/enterprise_10.6_patched/storage/innobase/buf/buf0buf.cc:2958
      #8  0x00005627c943b71c in flst_validate (base=0x7fc928008af0, boffset=134, mtr=0x7ffcf54ddda0) at /data/Server/enterprise_10.6_patched/storage/innobase/fut/fut0lst.cc:420
      #9  0x00005627c9424de3 in fsp_sys_tablespace_validate () at /data/Server/enterprise_10.6_patched/storage/innobase/fsp/fsp0fsp.cc:3571
      #10 0x00005627c9424ffc in fsp_system_tablespace_truncate () at /data/Server/enterprise_10.6_patched/storage/innobase/fsp/fsp0fsp.cc:3622
      #11 0x00005627c8fec002 in innobase_end () at /data/Server/enterprise_10.6_patched/storage/innobase/handler/ha_innodb.cc:4381
      

      Fix:
      InnoDB should avoid shrinking of system tablespace during bootstrap. Move the shrinking logic inside srv_purge_shutdown()

      Attachments

        Issue Links

          Activity

            thiru Thirunarayanan Balathandayuthapani created issue -
            thiru Thirunarayanan Balathandayuthapani made changes -
            Field Original Value New Value
            thiru Thirunarayanan Balathandayuthapani made changes -
            Fix Version/s 11.2 [ 28603 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Affects Version/s 11.2.3 [ 29521 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            Status Confirmed [ 10101 ] In Review [ 10002 ]
            marko Marko Mäkelä made changes -
            Summary During slow shutdown, move shrinking logic after purge completion innodb_fast_shutdown=0 may fail to fully shrink the InnoDB system tablespace

            I’d like to see a test to demonstrate this bug and the efficacy of the fix.

            marko Marko Mäkelä added a comment - I’d like to see a test to demonstrate this bug and the efficacy of the fix.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Summary innodb_fast_shutdown=0 may fail to fully shrink the InnoDB system tablespace Bootstrap hangs while shrinking the system tablespace
            thiru Thirunarayanan Balathandayuthapani made changes -
            Description InnoDB does shrinking the system tablespace during slow shutdown. Logically, it has to
            be done once purge is completed.
            InnoDB bootstrap hangs while shrinking the system tablespace during slow shutdown.

            {code}
            Thread-8
            =========
            #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
            #1 0x00005627c929646d in srw_mutex_impl<true>::wait (this=0x7fc928008b08, lk=2147483650) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:244
            #2 0x00005627c9295a41 in srw_mutex_impl<true>::wait_and_lock (this=0x7fc928008b08) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:331
            #3 0x00005627c901fb6f in srw_mutex_impl<true>::wr_lock (this=0x7fc928008b08) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:134
            #4 0x00005627c914966e in ssux_lock_impl<true>::u_lock (this=0x7fc928008b08) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:250
            #5 0x00005627c91472d2 in sux_lock<ssux_lock_impl<true> >::u_lock (this=0x7fc928008b08) at /data/Server/enterprise_10.6_patched/storage/innobase/include/sux_lock.h:378
            #6 0x00005627c93609fe in buf_page_get_low (page_id=..., zip_size=0, rw_latch=4, guess=0x0, mode=16, mtr=0x7fc8fefdc6c0, err=0x0, allow_ibuf_merge=false, no_wait=0x0)
                at /data/Server/enterprise_10.6_patched/storage/innobase/buf/buf0buf.cc:2895
            #7 0x00005627c9360ecd in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=4, guess=0x0, mode=16, mtr=0x7fc8fefdc6c0, err=0x0, allow_ibuf_merge=false, no_wait=0x0)
                at /data/Server/enterprise_10.6_patched/storage/innobase/buf/buf0buf.cc:2958
            #8 0x00005627c941b119 in xdes_get_descriptor (space=0x5627caf42310, offset=300, mtr=0x7fc8fefdc6c0, err=0x0, xdes=0x0) at /data/Server/enterprise_10.6_patched/storage/innobase/fsp/fsp0fsp.cc:407
            #9 0x00005627c9423014 in fseg_free_step (header=0x7fc9282d003c "", mtr=0x7fc8fefdc6c0, ahi=false) at /data/Server/enterprise_10.6_patched/storage/innobase/fsp/fsp0fsp.cc:2841
            #10 0x00005627c929cea7 in trx_purge_free_segment (rseg_hdr=0x7fc928008950, block=0x7fc928001ee0, mtr=...) at /data/Server/enterprise_10.6_patched/storage/innobase/trx/trx0purge.cc:316
            #11 0x00005627c92a2a37 in purge_sys_t::iterator::free_history_rseg (this=0x7fc8fefdcad0, rseg=...) at /data/Server/enterprise_10.6_patched/storage/innobase/trx/trx0purge.cc:489
            #12 0x00005627c929d18a in purge_sys_t::iterator::free_history (this=0x7fc8fefdcad0) at /data/Server/enterprise_10.6_patched/storage/innobase/trx/trx0purge.cc:551
            #13 0x00005627c928398b in purge_truncation_callback () at /data/Server/enterprise_10.6_patched/storage/innobase/srv/srv0srv.cc:1271
            #14 0x00005627c94a2c82 in tpool::task_group::execute (this=0x5627caca55a0 <purge_truncation_task_group>, t=0x5627caca5660 <purge_truncation_task>)
                at /data/Server/enterprise_10.6_patched/tpool/task_group.cc:55

            Thread-3
            ========
            #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
            #1 0x00005627c9296421 in srw_mutex_impl<false>::wait (this=0x5627ca255f48 <trx_sys+25160>, lk=2147483650) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:244
            #2 0x00005627c9295737 in srw_mutex_impl<false>::wait_and_lock (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:331
            #3 0x00005627c902079b in srw_mutex_impl<false>::wr_lock (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:134
            #4 0x00005627c9296256 in ssux_lock_impl<false>::rd_wait (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:397
            #5 0x00005627c91eecff in ssux_lock_impl<false>::rd_lock (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:247
            #6 0x00005627c9294f21 in srw_lock_debug::rd_lock (this=0x5627ca255f48 <trx_sys+25160>) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:623
            #7 0x00005627c92ca2af in trx_sys_t::history_size (this=0x5627ca24fd00 <trx_sys>) at /data/Server/enterprise_10.6_patched/storage/innobase/trx/trx0sys.cc:163
            #8 0x00005627c9286134 in purge_coordinator_state::do_purge (this=0x5627caca5320 <purge_state>) at /data/Server/enterprise_10.6_patched/storage/innobase/srv/srv0srv.cc:1627
            #9 0x00005627c928530c in purge_coordinator_callback () at /data/Server/enterprise_10.6_patched/storage/innobase/srv/srv0srv.cc:1716
            #10 0x00005627c94a2c82 in tpool::task_group::execute (this=0x5627caca5440 <purge_coordinator_task_group>, t=0x5627caca5500 <purge_coordinator_task>)
                at /data/Server/enterprise_10.6_patched/tpool/task_group.cc:55
            #11 0x00005627c94a2fac in tpool::task::execute (this=0x5627caca5500 <purge_coordinator_task>) at /data/Server/enterprise_10.6_patched/tpool/task.cc:32
            #12 0x00005627c949a64f in tpool::thread_pool_generic::worker_main (this=0x5627caebf0d0, thread_var=0x5627caebf6f0) at /data/Server/enterprise_10.6_patched/tpool/tpool_generic.cc:583


            Thread-1
            ========
            #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
            #1 0x00005627c929646d in srw_mutex_impl<true>::wait (this=0x7fc928008a38, lk=2147483650) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:244
            #2 0x00005627c9295a41 in srw_mutex_impl<true>::wait_and_lock (this=0x7fc928008a38) at /data/Server/enterprise_10.6_patched/storage/innobase/sync/srw_lock.cc:331
            #3 0x00005627c901fb6f in srw_mutex_impl<true>::wr_lock (this=0x7fc928008a38) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:134
            #4 0x00005627c914966e in ssux_lock_impl<true>::u_lock (this=0x7fc928008a38) at /data/Server/enterprise_10.6_patched/storage/innobase/include/srw_lock.h:250
            #5 0x00005627c91472d2 in sux_lock<ssux_lock_impl<true> >::u_lock (this=0x7fc928008a38) at /data/Server/enterprise_10.6_patched/storage/innobase/include/sux_lock.h:378
            #6 0x00005627c93609fe in buf_page_get_low (page_id=..., zip_size=0, rw_latch=4, guess=0x0, mode=10, mtr=0x7ffcf54ddda0, err=0x0, allow_ibuf_merge=false, no_wait=0x0)
                at /data/Server/enterprise_10.6_patched/storage/innobase/buf/buf0buf.cc:2895
            #7 0x00005627c9360ecd in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=4, guess=0x0, mode=10, mtr=0x7ffcf54ddda0, err=0x0, allow_ibuf_merge=false, no_wait=0x0)
                at /data/Server/enterprise_10.6_patched/storage/innobase/buf/buf0buf.cc:2958
            #8 0x00005627c943b71c in flst_validate (base=0x7fc928008af0, boffset=134, mtr=0x7ffcf54ddda0) at /data/Server/enterprise_10.6_patched/storage/innobase/fut/fut0lst.cc:420
            #9 0x00005627c9424de3 in fsp_sys_tablespace_validate () at /data/Server/enterprise_10.6_patched/storage/innobase/fsp/fsp0fsp.cc:3571
            #10 0x00005627c9424ffc in fsp_system_tablespace_truncate () at /data/Server/enterprise_10.6_patched/storage/innobase/fsp/fsp0fsp.cc:3622
            #11 0x00005627c8fec002 in innobase_end () at /data/Server/enterprise_10.6_patched/storage/innobase/handler/ha_innodb.cc:4381
            {code}

            Fix:
            InnoDB should avoid shrinking of system tablespace during bootstrap. Move the shrinking logic inside {{srv_purge_shutdown()}}

            thiru, thank you, the added debug assertion in fsp_system_tablespace_truncate() would fail on server bootstrap if the fix is not present. Looks good after adding a couple more assertions.

            marko Marko Mäkelä added a comment - thiru , thank you, the added debug assertion in fsp_system_tablespace_truncate() would fail on server bootstrap if the fix is not present. Looks good after adding a couple more assertions.
            thiru Thirunarayanan Balathandayuthapani made changes -
            issue.field.resolutiondate 2024-09-09 09:45:10.0 2024-09-09 09:45:10.473
            thiru Thirunarayanan Balathandayuthapani made changes -
            Fix Version/s 11.2.6 [ 29906 ]
            Fix Version/s 11.2 [ 28603 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            JIraAutomate JiraAutomate made changes -
            Fix Version/s 11.4.4 [ 29907 ]

            People

              thiru Thirunarayanan Balathandayuthapani
              thiru Thirunarayanan Balathandayuthapani
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.