Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30911

Multi-batch recovery of ROW_FORMAT=COMPRESSED table hangs

    XMLWordPrintable

Details

    Description

      Some time ago, mleich encountered a case where a multi-batch crash recovery would hang:

      11.0 d5d7c8ba96fbf8800d7273034f56cbb39f379982

      2023-02-21  9:06:01 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=24728170
      2023-02-21  9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-27.ibd' with space ID 1381. Another data file called ./cool_down/t3.ibd exists with the same space ID.
      2023-02-21  9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-27.ibd' with space ID 1381. Another data file called ./cool_down/t3.ibd exists with the same space ID.
      2023-02-21  9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-27.ibd' with space ID 1381. Another data file called ./cool_down/t3.ibd exists with the same space ID.
      2023-02-21  9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1442. Another data file called ./cool_down/t1.ibd exists with the same space ID.
      2023-02-21  9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1442. Another data file called ./cool_down/t1.ibd exists with the same space ID.
      2023-02-21  9:06:07 0 [Note] InnoDB: Starting a batch to recover 412 pages from redo log.
      2023-02-21  9:06:10 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1442. Another data file called ./cool_down/t1.ibd exists with the same space ID.
      2023-02-21  9:06:10 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1497. Another data file called ./cool_down/t2.ibd exists with the same space ID.
      2023-02-21  9:06:10 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1497. Another data file called ./cool_down/t2.ibd exists with the same space ID.
      2023-02-21  9:06:11 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1497. Another data file called ./cool_down/t2.ibd exists with the same space ID.
      2023-02-21  9:06:11 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1538. Another data file called ./test/t2.ibd exists with the same space ID.
      2023-02-21  9:06:11 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1538. Another data file called ./test/t2.ibd exists with
       the same space ID.
      2023-02-21  9:06:11 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1538. Another data file called ./test/t2.ibd exists with the same space ID.
      2023-02-21  9:06:12 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1588. Another data file called ./test/t1.ibd exists with the same space ID.
      2023-02-21  9:06:12 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1588. Another data file called ./test/t1.ibd exists with the same space ID.
      2023-02-21  9:06:12 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1588. Another data file called ./test/t1.ibd exists with the same space ID.
      2023-02-21  9:06:12 0 [Note] InnoDB: Starting a batch to recover 688 pages from redo log.
      2023-02-21  9:06:13 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
      2023-02-21  9:06:13 0 [Note] InnoDB: Trx id counter is 9832
      2023-02-21  9:06:13 0 [Note] InnoDB: Starting final batch to recover 90 pages from redo log.
      230221  9:56:01 [ERROR] mysqld got signal 6 ;
      

      As you can see from the last two lines, InnoDB was hung for about 50 minutes before the environment issued kill -ABRT. In the resulting core dump, we would have 2 entries in recv_sys.pages:

      #12 0x0000558fc708eebd in recv_sys_t::apply (this=0x558fc8eb6380 <recv_sys>, last_batch=last_batch@entry=true) at /data/Server/bb-11.1-MDEV-30545A/storage/innobase/log/log0recv.cc:3486
      3486	            my_cond_wait(&cond, &mutex.m_mutex);
      (gdb) p *this
      $6 = {mutex = {m_mutex = {global = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
              __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>, __align = 0}, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, 
                __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 12 times>, "\001", '\000' <repeats 26 times>, __align = 0}, 
            file = 0x558fc81d3ae0 "/data/Server/bb-11.1-MDEV-30545A/storage/innobase/log/log0recv.cc", name = 0x558fc81d51e1 "mutex", line = 3458, count = 0, create_flags = 0, active_flags = 0, id = 15, 
            thread = 139641080355264, locked_mutex = 0x6120000037d8, used_mutex = 0x612000003858, prev = 0x0, next = 0x0}, m_psi = 0x0}, cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, 
                __high = 0}}, __g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 8, __g_signals = {0, 0}}, 
          __size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000", __align = 2}, apply_batch_on = true, found_corrupt_log = false, 
        found_corrupt_fs = false, recovery_on = true, apply_log_recs = true, len = 100663296, offset = 35194518, lsn = 35194518, file_checkpoint = 24945758, progress_time = 1676999161, 
        pages = std::map with 2 elements = {[{m_id = 6605659701248}] = {state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f00acbdecb8, tail = 0x7f00ad006f68}}, [{m_id = 6605659701251}] = {
            state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f00acc00460, tail = 0x7f00ad007058}}}, truncated_undo_spaces = {{lsn = 0, pages = 0} <repeats 127 times>}, dblwr = {
          pages = std::deque with 0 elements}, last_stored_lsn = 26198559, files = std::vector of length 0, capacity 0, blocks = {count = 15, start = 0x7f00acf33970, end = 0x7f00acb338a0, 
          node = &buf_block_t::unzip_LRU, init = 51966}, after_apply = false}
      (gdb) p pages
      $7 = std::map with 2 elements = {[{m_id = 6605659701248}] = {state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f00acbdecb8, tail = 0x7f00ad006f68}}, [{m_id = 6605659701251}] = {
          state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f00acc00460, tail = 0x7f00ad007058}}}
      

      Recovery is waiting for these pages to be read in. Both are for the tablespace 0x602 = 1538, which does exist in the core dump:

      (gdb) p *(fil_space_t*)0x612000009f40
      $128 = {<ilist_node<unflushed_spaces_tag_t>> = {next = 0x0, prev = 0x0}, <ilist_node<default_encrypt_tag_t>> = {next = 0x0, prev = 0x0}, <ilist_node<space_list_tag_t>> = {next = 0x61200000a0e0, 
          prev = 0x612000009de0}, <ilist_node<named_spaces_tag_t>> = {next = 0x612000008ef0, prev = 0x6120000088f0}, hash = 0x0, max_lsn = 35194518, id = 1538, is_being_truncated = false, 
        purpose = FIL_TYPE_TABLESPACE, chain = {count = 1, start = 0x60600016a720, end = 0x60600016a720, node = &fil_node_t::chain, init = 51966}, size = 640, size_in_header = 640, free_len = 2, free_limit = 256, 
        recv_size = 640, n_reserved_extents = 0, committed_size = {m = std::atomic<unsigned int> = { 640 }}, n_pending = std::atomic<unsigned int> = { 0 }, latch = {writer = {
            lock = std::atomic<unsigned int> = { 0 }}, readers = std::atomic<unsigned int> = { 0 }}, latch_owner = 0, latch_count = {m = std::atomic<unsigned int> = { 0 }}, crypt_data = 0x0, 
        is_in_unflushed_spaces = false, is_in_default_encrypt = false, is_corrupted = {<std::__atomic_flag_base> = {_M_i = false}, <No data fields>}, freed_range_mutex = {<std::__mutex_base> = {_M_mutex = {__data = {
                __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
              __align = 0}}, <No data fields>}, freed_ranges = {ranges = std::set with 0 elements}, last_freed_lsn = 0, flags = 41}
      

      It turns out that also the two missed pages exist in the buffer pool in the core dump:

      (gdb) p *(buf_block_t *) 0x7f00ad232520
      $137 = {page = {id_ = {m_id = 6605659701251}, hash = 0x0, oldest_modification_ = {m = std::atomic<unsigned long> = { 0 }}, lock = {lock = {writer = {lock = std::atomic<unsigned int> = { 0 }}, 
              readers = std::atomic<unsigned int> = { 0 }}, recursive = 0, writer = std::atomic<unsigned long> = { 0 }, readers_lock = {lock = std::atomic<unsigned int> = { 0 }}, 
            readers = std::atomic<std::unordered_multiset<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > *> = { 0x60600060ff20 }}, 
          frame = 0x7f00ad2a0000 "\031Z", <incomplete sequence \365>, zip = {data = 0x7f00acef8000 "\031Z", <incomplete sequence \365>, m_end = 4558, m_nonempty = 1, n_blobs = 30, ssize = 4, m_start = 118, 
            m_external = false, fix = {m = std::atomic<unsigned int> = { 1073741824 }}}, in_zip_hash = false, in_LRU_list = true, in_page_hash = true, in_free_list = false, list = {prev = 0x0, next = 0x0}, LRU = {
            prev = 0x7f00ad0319c0, next = 0x7f00ad031dd0}, old = 0, freed_page_clock = 0, access_time = {m_counter = std::atomic<unsigned int> = { 4137157028 }}}, in_withdraw_list = false, in_unzip_LRU_list = true, 
        unzip_LRU = {prev = 0x7f00ad131d00, next = 0x7f00ad131b60}, modify_clock = 2, n_bytes = 0, n_fields = 1, n_hash_helps = 0, left_side = true, n_pointers = {m_counter = std::atomic<unsigned long> = { 0 }}, 
        curr_n_fields = 0, curr_n_bytes = 0, curr_left_side = 0, index = 0x0}
      (gdb) p *(buf_block_t *) 0x7f00ad131b60
      $138 = {page = {id_ = {m_id = 6605659701248}, hash = 0x0, oldest_modification_ = {m = std::atomic<unsigned long> = { 0 }}, lock = {lock = {writer = {lock = std::atomic<unsigned int> = { 0 }}, 
              readers = std::atomic<unsigned int> = { 0 }}, recursive = 0, writer = std::atomic<unsigned long> = { 0 }, readers_lock = {lock = std::atomic<unsigned int> = { 0 }}, 
            readers = std::atomic<std::unordered_multiset<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > *> = { 0x0 }}, frame = 0x7f00ad170000 "\361\352\366Z", 
          zip = {data = 0x7f00ad0b2000 "\361\352\366Z", m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 4, m_start = 0, m_external = false, fix = {m = std::atomic<unsigned int> = { 1073741824 }}}, 
          in_zip_hash = false, in_LRU_list = true, in_page_hash = true, in_free_list = false, list = {prev = 0x0, next = 0x0}, LRU = {prev = 0x7f00ad031dd0, next = 0x7f00ad032450}, old = 0, freed_page_clock = 0, 
          access_time = {m_counter = std::atomic<unsigned int> = { 4137157027 }}}, in_withdraw_list = false, in_unzip_LRU_list = true, unzip_LRU = {prev = 0x7f00ad232520, next = 0x7f00ad0326c0}, modify_clock = 2, 
        n_bytes = 0, n_fields = 1, n_hash_helps = 0, left_side = true, n_pointers = {m_counter = std::atomic<unsigned long> = { 0 }}, curr_n_fields = 0, curr_n_bytes = 0, curr_left_side = 0, index = 0x0}
      

      Both are unlocked and have page.zip.fix=UNFIXED, that is, no page read is in progress. For some reason, the log application was never invoked on these pages. The reason may be that the pages are in ROW_FORMAT=COMPRESSED format (see page.zip.data).

      data.tar.xz recovered just fine for me when I was using a large buffer pool. With innodb_buffer_pool_size=8m I can reproduce the hang, also when using rr record.

      Attachments

        1. data.tar.xz
          3.57 MB
          Marko Mäkelä

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.