[MDEV-22577] innodb_fast_shutdown=0 fails to report purge progress Created: 2020-05-15  Updated: 2023-09-27  Resolved: 2020-06-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.3, 10.5
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File gdb.txt    
Issue Links:
Relates
relates to MDEV-29984 innodb_fast_shutdown=0 fails to repor... Closed
relates to MDEV-12323 Rollback progress log messages during... Closed
relates to MDEV-14486 InnoDB hang on shutdown Confirmed
relates to MDEV-16260 Scale the purge effort according to t... Open
relates to MDEV-21751 innodb_fast_shutdown=0 can be unneces... Closed
relates to MDEV-32247 mysqld.exe has been unable to stop. Closed

 Description   

config

[mysqld]
skip-name-resolve
 
max_prepared_stmt_count=1000000
max_connections=2048
back_log=2000
 
innodb_buffer_pool_size=27G
innodb_log_file_size=2G
innodb_log_buffer_size=64M
 
innodb_buffer_pool_dump_pct=100
innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1
innodb_fast_shutdown=0
 
innodb_flush_neighbors=0
innodb_flush_method=O_DIRECT
 
table_open_cache=4096
table_open_cache_instances=64
 
innodb_io_capacity=12000
innodb_io_capacity_max=16000
 
innodb_read_io_threads=8
innodb_write_io_threads=8
innodb_purge_threads=4
innodb_adaptive_hash_index=0

gdb backtrack

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
0x000077abc5bb9768 in __GI___nanosleep (requested_time=0x0, remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
28	../sysdeps/unix/sysv/linux/nanosleep.c: No such file or directory.
(gdb) bt
#0  0x000077abc5bb9768 in __GI___nanosleep (requested_time=0x0, remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00000028f88b1658 in os_thread_sleep (tm=100) at /home/dan/mariadb-server-not-worktree/storage/innobase/os/os0thread.cc:231
#2  0x00000028f897cd80 in srv_purge_shutdown () at /home/dan/mariadb-server-not-worktree/storage/innobase/srv/srv0srv.cc:2317
#3  0x00000028f89855ac in innodb_preshutdown () at /home/dan/mariadb-server-not-worktree/storage/innobase/srv/srv0start.cc:2076
#4  0x00000028f8328678 in plugin_pre_shutdown (plugin=0x10034c6ca50) at /home/dan/mariadb-server-not-worktree/sql/handler.cc:919
#5  0x00000028f7f99508 in plugin_foreach_with_mask (thd=0x0, func=0x28f832861c <plugin_pre_shutdown(THD*, plugin_ref, void*)>, type=1, state_mask=10, arg=0x0)
    at /home/dan/mariadb-server-not-worktree/sql/sql_plugin.cc:2470
#6  0x00000028f83286dc in ha_pre_shutdown () at /home/dan/mariadb-server-not-worktree/sql/handler.cc:926
#7  0x00000028f7e13870 in mysqld_main (argc=30, argv=0x10034c32290) at /home/dan/mariadb-server-not-worktree/sql/mysqld.cc:5643
#8  0x00000028f7e06574 in main (argc=3, argv=0x7ffffb418f18) at /home/dan/mariadb-server-not-worktree/sql/main.cc:25
(gdb) bt full
#0  0x000077abc5bb9768 in __GI___nanosleep (requested_time=0x0, remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
        r0 = 162
        r4 = 0
        r6 = 16384
        r8 = 0
        arg2 = <optimised out>
        r3 = 0
        r5 = 0
        r7 = 131579646544600
        arg1 = <optimised out>
        sc_err = <optimised out>
        sc_ret = <optimised out>
        sc_cancel_oldtype = 0
        sc_ret = <optimised out>
#1  0x00000028f88b1658 in os_thread_sleep (tm=100) at /home/dan/mariadb-server-not-worktree/storage/innobase/os/os0thread.cc:231
        t = {tv_sec = 0, tv_nsec = 100000}
#2  0x00000028f897cd80 in srv_purge_shutdown () at /home/dan/mariadb-server-not-worktree/storage/innobase/srv/srv0srv.cc:2317
No locals.
#3  0x00000028f89855ac in innodb_preshutdown () at /home/dan/mariadb-server-not-worktree/storage/innobase/srv/srv0start.cc:2076
        first_time = false
#4  0x00000028f8328678 in plugin_pre_shutdown (plugin=0x10034c6ca50) at /home/dan/mariadb-server-not-worktree/sql/handler.cc:919
        hton = 0x10034d3e0b8
#5  0x00000028f7f99508 in plugin_foreach_with_mask (thd=0x0, func=0x28f832861c <plugin_pre_shutdown(THD*, plugin_ref, void*)>, type=1, state_mask=10, arg=0x0)
    at /home/dan/mariadb-server-not-worktree/sql/sql_plugin.cc:2470
        idx = 8
        total = 11
        plugin = 0x10034c82870
        plugins = 0x7ffffb418890
        res = 0 '\000'
#6  0x00000028f83286dc in ha_pre_shutdown () at /home/dan/mariadb-server-not-worktree/sql/handler.cc:926
No locals.
#7  0x00000028f7e13870 in mysqld_main (argc=30, argv=0x10034c32290) at /home/dan/mariadb-server-not-worktree/sql/mysqld.cc:5643
        please_close_stdin = true
        ho_error = 0
        new_thread_stack_size = 299008
        user = 0x0
#8  0x00000028f7e06574 in main (argc=3, argv=0x7ffffb418f18) at /home/dan/mariadb-server-not-worktree/sql/main.cc:25
No locals.
(gdb) p purge_sys
$1 = {latch = {lock_word = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 536870912}, <No data fields>}, waiters = {<std::__atomic_base<int>> = {
        static _S_alignment = 4, _M_i = 0}, <No data fields>}, sx_recursive = 0, writer_is_wait_ex = false, writer_thread = 0, event = 0x100632d78e8, 
    wait_ex_event = 0x100632d7978, cfile_name = 0x28f8ee8920 "/home/dan/mariadb-server-not-worktree/storage/innobase/trx/trx0purge.cc", 
    last_x_file_name = 0x28f8ee8920 "/home/dan/mariadb-server-not-worktree/storage/innobase/trx/trx0purge.cc", cline = 175, is_block_lock = 0, last_x_line = 1276, 
    count_os_wait = 0, list = {prev = 0x1006202f378, next = 0x100632de080}, pfs_psi = 0x0}, m_enabled = {_M_base = {static _S_alignment = 1, _M_i = true}}, m_paused = {
    m_counter = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}, query = 0x100632dc5a0, view = {
    m_state = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, m_low_limit_id = 1051743806, m_up_limit_id = 1051743806, 
    m_creator_trx_id = 0, m_ids = std::vector of length 0, capacity 416, m_low_limit_no = 1051743806}, tail = {commit = 1551671548, undo_no = 0}, head = {
    commit = 1551671547, undo_no = 0}, next_stored = false, rseg = 0x100632b4a58, page_no = 3783890, hdr_page_no = 3783890, offset = 11346, hdr_offset = 11160, rseg_iter = {
    m_rsegs = {m_commit = 1551671567, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b57e8}}, m_iter = 0x0}, 
  purge_queue = std::priority_queue wrapping: std::vector of length 128, capacity 128 = {{m_commit = 1551671575, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632c0498}}, {m_commit = 1551671583, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b75e8}}, {m_commit = 1551671579, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632b5bb8}}, {m_commit = 1551671585, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b79b8}}, {
      m_commit = 1551671605, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c3da8}}, {m_commit = 1551671665, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b7fe8}}, {m_commit = 1551671591, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b84e8}}, {m_commit = 1551671603, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632b9a18}}, {m_commit = 1551671587, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b6be8}}, {
      m_commit = 1551671751, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b0148}}, {m_commit = 1551671671, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632bb578}}, {m_commit = 1551671691, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b5f88}}, {m_commit = 1551671705, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632bb1a8}}, {m_commit = 1551671601, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b65b8}}, {
      m_commit = 1551671703, m_rsegs = std::vector of length 1, capacity 1 = {0x100632ac918}}, {m_commit = 1551671651, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632bab78}}, {m_commit = 1551672053, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cbae8}}, {m_commit = 1551671659, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632ade48}}, {m_commit = 1551671621, m_rsegs = std::vector of length 1, capacity 1 = {0x100632ba548}}, {
      m_commit = 1551671909, m_rsegs = std::vector of length 1, capacity 1 = {0x100632ce5d8}}, {m_commit = 1551671803, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b3438}}, {m_commit = 1551671819, m_rsegs = std::vector of length 1, capacity 1 = {0x100632a5d68}}, {m_commit = 1551671683, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632acf48}}, {m_commit = 1551671829, m_rsegs = std::vector of length 1, capacity 1 = {0x100632a4e68}}, {
      m_commit = 1551671707, m_rsegs = std::vector of length 1, capacity 1 = {0x100632bbba8}}, {m_commit = 1551671745, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b9de8}}, {m_commit = 1551671779, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b0a18}}, {m_commit = 1551671933, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632b3938}}, {m_commit = 1551671697, m_rsegs = std::vector of length 1, capacity 1 = {0x100632bbf28}}, {
      m_commit = 1551671741, m_rsegs = std::vector of length 1, capacity 1 = {0x100632af378}}, {m_commit = 1551671867, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632ad578}}, {m_commit = 1551671693, m_rsegs = std::vector of length 1, capacity 1 = {0x100632afd78}}, {m_commit = 1551671925, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632ae848}}, {m_commit = 1551672075, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d2458}}, {
      m_commit = 1551672293, m_rsegs = std::vector of length 1, capacity 1 = {0x100632bcdd8}}, {m_commit = 1551671735, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b17a8}}, {m_commit = 1551672193, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cad58}}, {m_commit = 1551672007, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632c7d08}}, {m_commit = 1551671967, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b2908}}, {
      m_commit = 1551671979, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c45d8}}, {m_commit = 1551671983, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632c8208}}, {m_commit = 1551671903, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d8b98}}, {m_commit = 1551671889, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632a6898}}, {m_commit = 1551671841, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b93e8}}, {
      m_commit = 1551672055, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c28e8}}, {m_commit = 1551671757, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b70e8}}, {m_commit = 1551671775, m_rsegs = std::vector of length 1, capacity 1 = {0x100632ae348}}, {m_commit = 1551672081, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632c4e38}}, {m_commit = 1551671939, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cf218}}, {
      m_commit = 1551671907, m_rsegs = std::vector of length 1, capacity 1 = {0x100632a6d98}}, {m_commit = 1551671959, m_rsegs = std::vector of length 1, capacity 1 = {
--Type <RET> for more, q to quit, c to continue without paging-- 
        0x100632cf738}}, {m_commit = 1551671941, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d9d28}}, {m_commit = 1551671799, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632d9218}}, {m_commit = 1551671875, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b1f08}}, {
      m_commit = 1551671873, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cfcd8}}, {m_commit = 1551672057, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632d2958}}, {m_commit = 1551671971, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d06d8}}, {m_commit = 1551671887, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632b3e38}}, {m_commit = 1551671899, m_rsegs = std::vector of length 1, capacity 1 = {0x100632db428}}, {
      m_commit = 1551671905, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cd808}}, {m_commit = 1551671929, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b4558}}, {m_commit = 1551671957, m_rsegs = std::vector of length 1, capacity 1 = {0x100632ca228}}, {m_commit = 1551671935, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632b8db8}}, {m_commit = 1551671945, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c0998}}, {
      m_commit = 1551672145, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b2e08}}, {m_commit = 1551671949, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632a4598}}, {m_commit = 1551671943, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c66a8}}, {m_commit = 1551672477, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632c96f8}}, {m_commit = 1551672503, m_rsegs = std::vector of length 1, capacity 1 = {0x100632be6d8}}, {
      m_commit = 1551672433, m_rsegs = std::vector of length 1, capacity 1 = {0x100632dadf8}}, {m_commit = 1551672317, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632dba58}}, {m_commit = 1551671849, m_rsegs = std::vector of length 1, capacity 1 = {0x100632a4a98}}, {m_commit = 1551671791, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632ada78}}, {m_commit = 1551672305, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cec08}}, {
      m_commit = 1551672377, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c8d38}}, {m_commit = 1551673143, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b52e8}}, {m_commit = 1551672027, m_rsegs = std::vector of length 0, capacity 0}, {m_commit = 1551672451, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632bc598}}, {m_commit = 1551672341, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c9bf8}}, {m_commit = 1551672367, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632ce0d8}}, {m_commit = 1551672257, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d3e88}}, {
      m_commit = 1551672251, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c4938}}, {m_commit = 1551671991, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632af748}}, {m_commit = 1551672481, m_rsegs = std::vector of length 1, capacity 1 = {0x100632bca08}}, {m_commit = 1551671919, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632d97e8}}, {m_commit = 1551672527, m_rsegs = std::vector of length 1, capacity 1 = {0x100632bfe68}}, {
      m_commit = 1551671973, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d2e58}}, {m_commit = 1551672065, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632c2d78}}, {m_commit = 1551672437, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d0308}}, {m_commit = 1551672679, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632c38a8}}, {m_commit = 1551672387, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d3728}}, {
      m_commit = 1551671893, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cd308}}, {m_commit = 1551672397, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632c5548}}, {m_commit = 1551671977, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b89e8}}, {m_commit = 1551672465, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632c7808}}, {m_commit = 1551672105, m_rsegs = std::vector of length 1, capacity 1 = {0x100632b0518}}, {
      m_commit = 1551672177, m_rsegs = std::vector of length 1, capacity 1 = {0x100632a5738}}, {m_commit = 1551672115, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632bf968}}, {m_commit = 1551672165, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d0d08}}, {m_commit = 1551672093, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632b12a8}}, {m_commit = 1551672215, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c6ba8}}, {
      m_commit = 1551672037, m_rsegs = std::vector of length 1, capacity 1 = {0x100632ca728}}, {m_commit = 1551672453, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b2408}}, {m_commit = 1551672047, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cb258}}, {m_commit = 1551672285, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632a5238}}, {m_commit = 1551672063, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c10f8}}, {
      m_commit = 1551672181, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c8708}}, {m_commit = 1551672271, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632cc908}}, {m_commit = 1551672013, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cce08}}, {m_commit = 1551672017, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632d1708}}, {m_commit = 1551672333, m_rsegs = std::vector of length 1, capacity 1 = {0x100632cdbd8}}, {
      m_commit = 1551672301, m_rsegs = std::vector of length 1, capacity 1 = {0x100632bd2d8}}, {m_commit = 1551672077, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632c5a48}}, {m_commit = 1551672365, m_rsegs = std::vector of length 1, capacity 1 = {0x100632bdcd8}}, {m_commit = 1551672133, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632da1b8}}, {m_commit = 1551672153, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c33a8}}, {
      m_commit = 1551672023, m_rsegs = std::vector of length 1, capacity 1 = {0x100632bd7d8}}, {m_commit = 1551672533, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632d3228}}, {m_commit = 1551672209, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c60e8}}, {m_commit = 1551672595, 
--Type <RET> for more, q to quit, c to continue without paging--
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632bf338}}, {m_commit = 1551671953, m_rsegs = std::vector of length 1, capacity 1 = {0x100632d1208}}, {
      m_commit = 1551672621, m_rsegs = std::vector of length 1, capacity 1 = {0x100632be1d8}}, {m_commit = 1551672099, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632bed08}}, {m_commit = 1551672069, m_rsegs = std::vector of length 1, capacity 1 = {0x100632c7308}}, {m_commit = 1551672135, 
      m_rsegs = std::vector of length 1, capacity 1 = {0x100632aed48}}, {m_commit = 1551672303, m_rsegs = std::vector of length 1, capacity 1 = {0x100632a6268}}, {
      m_commit = 1551671947, m_rsegs = std::vector of length 1, capacity 1 = {0x100632da8f8}}, {m_commit = 1551671567, m_rsegs = std::vector of length 1, capacity 1 = {
        0x100632b57e8}}}, pq_mutex = {m_impl = {m_lock_word = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 1}, <No data fields>}, 
      m_event = 0x100632d7a08, m_policy = {m_count = {m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, 
m_id = LATCH_ID_PURGE_SYS_PQ}}, m_ptr = 0x0}, truncate = {
    current = 0x0, last = 0x0}, heap = 0x100632d7b08}
(gdb) 
(gdb) 
(gdb) 
(gdb) p srv_undo_sources
$2 = false
(gdb) p srv_fast_shutdown
$3 = 0
(gdb) p trx_sys
$4 = {m_max_trx_id = {m_counter = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 1051743806}, <No data fields>}}, 
  m_rw_trx_hash_version = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 1051743806}, <No data fields>}, m_initialised = true, rseg_history_len = {
    m_counter = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 137963750}, <No data fields>}}, mutex = {m_impl = {
      m_lock_word = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, m_event = 0x10061bffdf8, m_policy = {m_count = {
          m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, m_id = LATCH_ID_TRX_SYS}}, m_ptr = 0x0}, trx_list = {count = 1, start = 0x77abaf9a0030, 
    end = 0x77abaf9a0030, node = &trx_t::trx_list}, temp_rsegs = {0x1006323ef18, 0x100632271b8, 0x100632432f8, 0x10063230338, 0x1006322a338, 0x1006322bd18, 0x100631f7d58, 
    0x100631eaf78, 0x10063200578, 0x100631ec0b8, 0x100631bd318, 0x100631a7718, 0x100631f5e98, 0x100631eee18, 0x100631fd6f8, 0x100631fe058, 0x10063168598, 0x10063178858, 
    0x100631e3ad8, 0x100631e8158, 0x100631e7498, 0x100631e63b8, 0x100631eccb8, 0x100631fab18, 0x100631f4038, 0x10063167218, 0x100631734b8, 0x1006316ccd8, 0x10063243d78, 
    0x10063230e18, 0x100631fa878, 0x10063203f98, 0x1006312dcd8, 0x1006311fcf8, 0x1006312e158, 0x10063130a98, 0x100631fdd58, 0x100631efb38, 0x100631e4f18, 0x100630e4a18, 
    0x100630e9e18, 0x10063093d78, 0x10063098e78, 0x10063093898, 0x100630819b8, 0x10063087838, 0x1006308cb78, 0x10063084118, 0x1006307dbd8, 0x10063085a98, 0x10063083cf8, 
    0x100630803f8, 0x100630dc2b8, 0x100630ebcd8, 0x100630cc2f8, 0x100630d15d8, 0x100630d8058, 0x100630dea18, 0x100630d6c18, 0x100630d4ff8, 0x100630e7778, 0x100630e02d8, 
    0x100630cbd58, 0x100630ecb78, 0x100630db958, 0x10062fdb638, 0x10063049df8, 0x100630d6f78, 0x100630d2178, 0x1006308d0b8, 0x1006309a678, 0x1006307f738, 0x100630904d8, 
    0x10063042d18, 0x1006302acb8, 0x10063044cf8, 0x10063043f78, 0x100630959f8, 0x100630857f8, 0x1006308ba38, 0x100630997d8, 0x1006308f518, 0x1006308a8f8, 0x10063086cf8, 
    0x10062fe8a78, 0x10062fea6f8, 0x10062fd84b8, 0x10062fd6838, 0x10062ff2cd8, 0x10062ff0998, 0x10062fef8b8, 0x10062fd5398, 0x10062ff4718, 0x10062fe9198, 0x10062fe52f8, 
    0x10062fed098, 0x10062f13a18, 0x10062f31f58, 0x10062fd4eb8, 0x10062fdc5f8, 0x10062fde518, 0x10062ff38d8, 0x10062f8e8f8, 0x10062f8fa38, 0x10062fd8f38, 0x10062fda6d8, 
    0x10062fde878, 0x10062fe7398, 0x10062fe42d8, 0x10062febef8, 0x10062fd56f8, 0x10062fe7878, 0x10062fda8b8, 0x10062fec798, 0x10062ff1e38, 0x10062fe3918, 0x10062fe1d58, 
    0x10062ff3458, 0x10062fe8658, 0x10062ff4238, 0x10062f8e538, 0x10062fde038, 0x10062fe7f38, 0x10062f94d78, 0x10062fd5038, 0x10062ff3cf8, 0x10062fd6bf8, 0x10062fe9e58}, 
  rseg_array = {0x100632d8b98, 0x100632d9218, 0x100632d97e8, 0x100632d9d28, 0x100632da1b8, 0x100632da8f8, 0x100632dadf8, 0x100632db428, 0x100632dba58, 0x100632cc908, 
    0x100632cce08, 0x100632cd308, 0x100632cd808, 0x100632cdbd8, 0x100632ce0d8, 0x100632ce5d8, 0x100632cec08, 0x100632cf738, 0x100632cfcd8, 0x100632d0308, 0x100632d06d8, 
    0x100632d0d08, 0x100632d1208, 0x100632d1708, 0x100632d2458, 0x100632d2958, 0x100632d2e58, 0x100632d3228, 0x100632d3728, 0x100632d3e88, 0x100632c45d8, 0x100632c4938, 
    0x100632c4e38, 0x100632cf218, 0x100632c60e8, 0x100632c66a8, 0x100632c6ba8, 0x100632c7308, 0x100632c7808, 0x100632c7d08, 0x100632c8208, 0x100632c8708, 0x100632c8d38, 
    0x100632c96f8, 0x100632c9bf8, 0x100632ca228, 0x100632ca728, 0x100632cad58, 0x100632cb258, 0x100632cbae8, 0x100632bc598, 0x100632bca08, 0x100632bcdd8, 0x100632bd2d8, 
    0x100632bd7d8, 0x100632bdcd8, 0x100632be1d8, 0x100632be6d8, 0x100632bed08, 0x100632bf338, 0x100632bf968, 0x100632bfe68, 0x100632c0498, 0x100632c0998, 0x100632c10f8, 
    0x100632c5548, 0x100632c5a48, 0x100632c28e8, 0x100632c2d78, 0x100632c33a8, 0x100632c38a8, 0x100632c3da8, 0x100632b4558, 0x100632b4a58, 0x100632b52e8, 0x100632b57e8, 
    0x100632b5bb8, 0x100632b5f88, 0x100632b65b8, 0x100632b6be8, 0x100632b70e8, 0x100632b75e8, 0x100632b79b8, 0x100632b7fe8, 0x100632b84e8, 0x100632b89e8, 0x100632b8db8, 
    0x100632b93e8, 0x100632b9a18, 0x100632b9de8, 0x100632ba548, 0x100632bab78, 0x100632bb1a8, 0x100632bb578, 0x100632bbba8, 0x100632bbf28, 0x100632ac918, 0x100632acf48, 
    0x100632ad578, 0x100632ada78, 0x100632ade48, 0x100632ae348, 0x100632ae848, 0x100632aed48, 0x100632af378, 0x100632af748, 0x100632afd78, 0x100632b0148, 0x100632b0518, 
    0x100632b0a18, 0x100632b12a8, 0x100632b17a8, 0x100632b1f08, 0x100632b2408, 0x100632b2908, 0x100632b2e08, 0x100632b3438, 0x100632b3938, 0x100632b3e38, 0x100632a4598, 
    0x100632a4a98, 0x100632a4e68, 0x100632a5238, 0x100632a5738, 0x100632a5d68, 0x100632a6268, 0x100632a6898, 0x100632a6d98}, rw_trx_hash = {hash = {array = {level = {
          0x100632a7290, 0x779e34158048, 0x0, 0x0}, size_of_element = 8}, alloc = {pinbox = {pinarray = {level = {0x10063258e78, 0x779ea8155e88, 0x0, 0x0}, 
            size_of_element = 184}, free_func = 0x28f8bc44d0 <alloc_free>, free_func_arg = 0x28f95a94a8 <trx_sys+2728>, free_ptr_offset = 16, pinstack_top_ver = 1684078815, 
          pins_in_array = 385}, top = 0x779f24164118 "\271\001\001\360\236w", element_size = 104, mallocs = 0, constructor = 0x28f89c6040
     <rw_trx_hash_t::rw_trx_hash_constructor(unsigned char*)>, destructor = 0x28f89c60f8 <rw_trx_hash_t::rw_trx_hash_destructor(unsigned char*)>}, get_key = 0x0, 
      initializer = 0x28f89c61f0 <rw_trx_hash_t::rw_trx_hash_initializer(st_lf_hash*, rw_trx_hash_element_t*, trx_t*)>, hash_function = 0x28f8bc5250 <calc_hash>, 
      charset = 0x28f95afd60 <my_charset_bin>, key_offset = 0, key_length = 8, element_size = 72, flags = 1, size = 512, count = 0}}, recovered_wsrep_xid = {formatID = -1, 
    gtrid_length = 0, bqual_length = 0, data = '\000' <repeats 127 times>}, recovered_binlog_offset = 0, recovered_binlog_filename = '\000' <repeats 511 times>}

Just ran the server though the sysbench tests:

oltp_update_non_index oltp_read_only oltp_point_select

SIGTERM for a normal shutdown started 4 hrs ago



 Comments   
Comment by Marko Mäkelä [ 2020-05-15 ]

I think that this report might essentially be duplicating MDEV-14486. In 10.5, MDEV-16264 changed things a little and made it possibly harder to read stack traces.

Comment by Marko Mäkelä [ 2020-05-15 ]

The plugin_pre_shutdown() was added in MDEV-16264.

Comment by Vladislav Vaintroub [ 2020-05-15 ]

danblack gdb thread apply all bt, please

Comment by Daniel Black [ 2020-05-16 ]

ack. A strace of the process was exclusively nanosleep.

Comment by Vladislav Vaintroub [ 2020-05-16 ]

It is obviously doing something in one thread, the purge coordinator . the fact that there are 36 threadpool threads also tells us, that there was enough work for all of them, no less than 1 minute ago, otherwise the threads would shutdown. What kind of work could there be no less that 1 minute ago. the purge workers , possibly 3 of them, some threads to serve Innotdb ticking timers (says 3, at most). The rest I'd guess would be serving "simulated AIO", because you do not have a io_getevent anywhere in your stacktraces.

Perhaps, there is more than nanosleep, if you strace a little more.

Comment by Vladislav Vaintroub [ 2020-05-16 ]

Perhaps we need some kind of progress indicator for the purge shutdown, say reporting history length every 30 seconds or so.

Comment by Marko Mäkelä [ 2020-05-18 ]

Are any purge worker tasks active (and how would you see that)? MDEV-14486 is about the purge coordinator occasionally exiting before purge workers have exited. The pre-shutdown code was written by wlad in MDEV-16264, and I am not deeply familiar with that.

On normal shutdown, all purge tasks should exit as soon as possible.

Somewhat related to this, a follow-up change was filed after MDEV-16264 , suggesting that shutdown could be sped up by starting a larger number of purge workers:
MDEV-21751 innodb_fast_shutdown=0 can be unnecessarily slow

Comment by Vladislav Vaintroub [ 2020-05-18 ]

the pre-shutdown is replacement for the a special proxy thread that was hanging doing nothing the whole time, and only waited until purge is done. It has pretty much the same functionality, sans an idle thread and useless THD.

Comment by Marko Mäkelä [ 2020-06-03 ]

In gdb.txt, Thread 5 (purge_coordinator_callback()) is happily fetching more records to be processed. Even a large innodb_purge_batch_size should not keep it busy for very long. The default value is 300, and the maximum is 5000.

All purge worker tasks are idle. It looks like the reason for the hang is this loop in srv_purge_shutdown() in Thread 1:

void srv_purge_shutdown()
{
	if (purge_sys.enabled()) {
		while(!srv_purge_should_exit()) {
			ut_a(!purge_sys.paused());
			srv_wake_purge_thread_if_not_active();
			os_thread_sleep(100);
		}
		purge_sys.coordinator_shutdown();
		srv_shutdown_purge_tasks();
	}
}

Why does srv_purge_should_exit() not hold? It basically is srv_undo_sources && (srv_fast_shutdown || (!trx_sys.rseg_history_len && !trx_sys.any_active_transactions())).
Which of the conditions does not hold? We should have assigned srv_undo_sources=false in srv_shutdown_bg_undo_sources(), which was called just before srv_purge_shutdown(). Also, srv_fast_shutdown should hold by default (unless you did SET GLOBAL innodb_fast_shutdown=0 to request everything to be purged).

danblack, can you please provide the output of print trx_sys for such a hang? And show exactly why srv_purge_should_exit() does not hold?

Comment by Marko Mäkelä [ 2020-06-03 ]

I noticed innodb_fast_shutdown=0 in the Description. wlad, can you please review that innodb_preshutdown() corresponds to what thd_destructor_proxy() and srv_running did in the past? What exactly replaced the logic that checked srv_running in srv_purge_wakeup()?

Comment by Vladislav Vaintroub [ 2020-06-03 ]

All purge worker tasks are idle, because coordinator is fetching something, It was there since the beginning of the coordinator/worker logic. coordinator collects some records to work on, it does part of this worker work, it distributes the rest to workers, and it waits for all other workers to end.

Comment by Vladislav Vaintroub [ 2020-06-03 ]

srv_running is replaced with nothing. why on earth would you need that? If Innodb is going to shutdown, innodb_preshutdown is called, there is no need to recheck a variable.

Comment by Marko Mäkelä [ 2020-06-03 ]

danblack, do you see any output with the following patch? It should enable progress reporting to the error log every 15 seconds on slow shutdown (innodb_fast_shutdown=0).

diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc
index 26d217ed0ad..e178ee7ff76 100644
--- a/storage/innobase/srv/srv0srv.cc
+++ b/storage/innobase/srv/srv0srv.cc
@@ -2017,19 +2017,20 @@ static bool srv_purge_should_exit()
 		return(true);
 	}
 	/* Slow shutdown was requested. */
-	uint32_t history_size = trx_sys.rseg_history_len;
-	if (history_size) {
-#if defined HAVE_SYSTEMD && !defined EMBEDDED_LIBRARY
+	if (uint32_t history_size = trx_sys.rseg_history_len) {
 		static time_t progress_time;
 		time_t now = time(NULL);
 		if (now - progress_time >= 15) {
 			progress_time = now;
+#if defined HAVE_SYSTEMD && !defined EMBEDDED_LIBRARY
 			service_manager_extend_timeout(
 				INNODB_EXTEND_TIMEOUT_INTERVAL,
 				"InnoDB: to purge %u transactions",
 				history_size);
-		}
 #endif
+			ib::info() << "to purge " << history_size
+				   << " transactions";
+		}
 		return false;
 	}
 

Comment by Marko Mäkelä [ 2020-06-03 ]

danblack’s additional debugging with my diagnostic patch suggests we are mainly witnessing MDEV-21751. A design problem with the purge (as witnessed by gdb.txt) is that much of the time, only the coordinator is running, buffering undo log records that are then distributed to the workers. A different innodb_purge_batch_size might help a little. Even then, there should be some contention on dict_sys.mutex for opening tables.

I think that we can close this ticket by implementing the progress reporting for slow shutdown, in a similar way as it was done in MDEV-12323. Currently, we only report progress to systemd, as part of MDEV-14705.

Generated at Thu Feb 08 09:15:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.