[MDEV-30911] Multi-batch recovery of ROW_FORMAT=COMPRESSED table hangs - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 11.0(EOL), 11.1(EOL)
Fix Version/s: 11.0.2, 11.1.1
Component/s: Storage Engine - InnoDB
Labels:

Description

Some time ago, mleich encountered a case where a multi-batch crash recovery would hang:

11.0 d5d7c8ba96fbf8800d7273034f56cbb39f379982
2023-02-21 9:06:01 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=24728170
2023-02-21 9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-27.ibd' with space ID 1381. Another data file called ./cool_down/t3.ibd exists with the same space ID.
2023-02-21 9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-27.ibd' with space ID 1381. Another data file called ./cool_down/t3.ibd exists with the same space ID.
2023-02-21 9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-27.ibd' with space ID 1381. Another data file called ./cool_down/t3.ibd exists with the same space ID.
2023-02-21 9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1442. Another data file called ./cool_down/t1.ibd exists with the same space ID.
2023-02-21 9:06:06 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1442. Another data file called ./cool_down/t1.ibd exists with the same space ID.
2023-02-21 9:06:07 0 [Note] InnoDB: Starting a batch to recover 412 pages from redo log.
2023-02-21 9:06:10 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1442. Another data file called ./cool_down/t1.ibd exists with the same space ID.
2023-02-21 9:06:10 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1497. Another data file called ./cool_down/t2.ibd exists with the same space ID.
2023-02-21 9:06:10 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1497. Another data file called ./cool_down/t2.ibd exists with the same space ID.
2023-02-21 9:06:11 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1497. Another data file called ./cool_down/t2.ibd exists with the same space ID.
2023-02-21 9:06:11 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1538. Another data file called ./test/t2.ibd exists with the same space ID.
2023-02-21 9:06:11 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1538. Another data file called ./test/t2.ibd exists with
the same space ID.
2023-02-21 9:06:11 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1538. Another data file called ./test/t2.ibd exists with the same space ID.
2023-02-21 9:06:12 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1588. Another data file called ./test/t1.ibd exists with the same space ID.
2023-02-21 9:06:12 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1588. Another data file called ./test/t1.ibd exists with the same space ID.
2023-02-21 9:06:12 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-24ccd8-35.ibd' with space ID 1588. Another data file called ./test/t1.ibd exists with the same space ID.
2023-02-21 9:06:12 0 [Note] InnoDB: Starting a batch to recover 688 pages from redo log.
2023-02-21 9:06:13 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
2023-02-21 9:06:13 0 [Note] InnoDB: Trx id counter is 9832
2023-02-21 9:06:13 0 [Note] InnoDB: Starting final batch to recover 90 pages from redo log.
230221 9:56:01 [ERROR] mysqld got signal 6 ;

As you can see from the last two lines, InnoDB was hung for about 50 minutes before the environment issued kill -ABRT. In the resulting core dump, we would have 2 entries in recv_sys.pages:

#12 0x0000558fc708eebd in recv_sys_t::apply (this=0x558fc8eb6380 <recv_sys>, last_batch=last_batch@entry=true) at /data/Server/bb-11.1-MDEV-30545A/storage/innobase/log/log0recv.cc:3486

3486	            my_cond_wait(&cond, &mutex.m_mutex);

(gdb) p *this

$6 = {mutex = {m_mutex = {global = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},

        __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>, __align = 0}, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, __spins = 0, __elision = 0,

          __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 12 times>, "\001", '\000' <repeats 26 times>, __align = 0},

      file = 0x558fc81d3ae0 "/data/Server/bb-11.1-MDEV-30545A/storage/innobase/log/log0recv.cc", name = 0x558fc81d51e1 "mutex", line = 3458, count = 0, create_flags = 0, active_flags = 0, id = 15,

      thread = 139641080355264, locked_mutex = 0x6120000037d8, used_mutex = 0x612000003858, prev = 0x0, next = 0x0}, m_psi = 0x0}, cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2,

          __high = 0}}, __g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 8, __g_signals = {0, 0}},

    __size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000", __align = 2}, apply_batch_on = true, found_corrupt_log = false,

  found_corrupt_fs = false, recovery_on = true, apply_log_recs = true, len = 100663296, offset = 35194518, lsn = 35194518, file_checkpoint = 24945758, progress_time = 1676999161,

  pages = std::map with 2 elements = {[{m_id = 6605659701248}] = {state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f00acbdecb8, tail = 0x7f00ad006f68}}, [{m_id = 6605659701251}] = {

      state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f00acc00460, tail = 0x7f00ad007058}}}, truncated_undo_spaces = {{lsn = 0, pages = 0} <repeats 127 times>}, dblwr = {

    pages = std::deque with 0 elements}, last_stored_lsn = 26198559, files = std::vector of length 0, capacity 0, blocks = {count = 15, start = 0x7f00acf33970, end = 0x7f00acb338a0,

    node = &buf_block_t::unzip_LRU, init = 51966}, after_apply = false}

(gdb) p pages

$7 = std::map with 2 elements = {[{m_id = 6605659701248}] = {state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f00acbdecb8, tail = 0x7f00ad006f68}}, [{m_id = 6605659701251}] = {

    state = page_recv_t::RECV_BEING_READ, last_offset = 1, log = {head = 0x7f00acc00460, tail = 0x7f00ad007058}}}

Recovery is waiting for these pages to be read in. Both are for the tablespace 0x602 = 1538, which does exist in the core dump:

(gdb) p *(fil_space_t*)0x612000009f40

$128 = {<ilist_node<unflushed_spaces_tag_t>> = {next = 0x0, prev = 0x0}, <ilist_node<default_encrypt_tag_t>> = {next = 0x0, prev = 0x0}, <ilist_node<space_list_tag_t>> = {next = 0x61200000a0e0,

    prev = 0x612000009de0}, <ilist_node<named_spaces_tag_t>> = {next = 0x612000008ef0, prev = 0x6120000088f0}, hash = 0x0, max_lsn = 35194518, id = 1538, is_being_truncated = false,

  purpose = FIL_TYPE_TABLESPACE, chain = {count = 1, start = 0x60600016a720, end = 0x60600016a720, node = &fil_node_t::chain, init = 51966}, size = 640, size_in_header = 640, free_len = 2, free_limit = 256,

  recv_size = 640, n_reserved_extents = 0, committed_size = {m = std::atomic<unsigned int> = { 640 }}, n_pending = std::atomic<unsigned int> = { 0 }, latch = {writer = {

      lock = std::atomic<unsigned int> = { 0 }}, readers = std::atomic<unsigned int> = { 0 }}, latch_owner = 0, latch_count = {m = std::atomic<unsigned int> = { 0 }}, crypt_data = 0x0,

  is_in_unflushed_spaces = false, is_in_default_encrypt = false, is_corrupted = {<std::__atomic_flag_base> = {_M_i = false}, <No data fields>}, freed_range_mutex = {<std::__mutex_base> = {_M_mutex = {__data = {

          __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>,

        __align = 0}}, <No data fields>}, freed_ranges = {ranges = std::set with 0 elements}, last_freed_lsn = 0, flags = 41}

It turns out that also the two missed pages exist in the buffer pool in the core dump:

(gdb) p *(buf_block_t *) 0x7f00ad232520

$137 = {page = {id_ = {m_id = 6605659701251}, hash = 0x0, oldest_modification_ = {m = std::atomic<unsigned long> = { 0 }}, lock = {lock = {writer = {lock = std::atomic<unsigned int> = { 0 }},

        readers = std::atomic<unsigned int> = { 0 }}, recursive = 0, writer = std::atomic<unsigned long> = { 0 }, readers_lock = {lock = std::atomic<unsigned int> = { 0 }},

      readers = std::atomic<std::unordered_multiset<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > *> = { 0x60600060ff20 }},

    frame = 0x7f00ad2a0000 "\031Z", <incomplete sequence \365>, zip = {data = 0x7f00acef8000 "\031Z", <incomplete sequence \365>, m_end = 4558, m_nonempty = 1, n_blobs = 30, ssize = 4, m_start = 118,

      m_external = false, fix = {m = std::atomic<unsigned int> = { 1073741824 }}}, in_zip_hash = false, in_LRU_list = true, in_page_hash = true, in_free_list = false, list = {prev = 0x0, next = 0x0}, LRU = {

      prev = 0x7f00ad0319c0, next = 0x7f00ad031dd0}, old = 0, freed_page_clock = 0, access_time = {m_counter = std::atomic<unsigned int> = { 4137157028 }}}, in_withdraw_list = false, in_unzip_LRU_list = true,

  unzip_LRU = {prev = 0x7f00ad131d00, next = 0x7f00ad131b60}, modify_clock = 2, n_bytes = 0, n_fields = 1, n_hash_helps = 0, left_side = true, n_pointers = {m_counter = std::atomic<unsigned long> = { 0 }},

  curr_n_fields = 0, curr_n_bytes = 0, curr_left_side = 0, index = 0x0}

(gdb) p *(buf_block_t *) 0x7f00ad131b60

$138 = {page = {id_ = {m_id = 6605659701248}, hash = 0x0, oldest_modification_ = {m = std::atomic<unsigned long> = { 0 }}, lock = {lock = {writer = {lock = std::atomic<unsigned int> = { 0 }},

        readers = std::atomic<unsigned int> = { 0 }}, recursive = 0, writer = std::atomic<unsigned long> = { 0 }, readers_lock = {lock = std::atomic<unsigned int> = { 0 }},

      readers = std::atomic<std::unordered_multiset<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > *> = { 0x0 }}, frame = 0x7f00ad170000 "\361\352\366Z",

    zip = {data = 0x7f00ad0b2000 "\361\352\366Z", m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 4, m_start = 0, m_external = false, fix = {m = std::atomic<unsigned int> = { 1073741824 }}},

    in_zip_hash = false, in_LRU_list = true, in_page_hash = true, in_free_list = false, list = {prev = 0x0, next = 0x0}, LRU = {prev = 0x7f00ad031dd0, next = 0x7f00ad032450}, old = 0, freed_page_clock = 0,

    access_time = {m_counter = std::atomic<unsigned int> = { 4137157027 }}}, in_withdraw_list = false, in_unzip_LRU_list = true, unzip_LRU = {prev = 0x7f00ad232520, next = 0x7f00ad0326c0}, modify_clock = 2,

  n_bytes = 0, n_fields = 1, n_hash_helps = 0, left_side = true, n_pointers = {m_counter = std::atomic<unsigned long> = { 0 }}, curr_n_fields = 0, curr_n_bytes = 0, curr_left_side = 0, index = 0x0}

Both are unlocked and have page.zip.fix=UNFIXED, that is, no page read is in progress. For some reason, the log application was never invoked on these pages. The reason may be that the pages are in ROW_FORMAT=COMPRESSED format (see page.zip.data).

data.tar.xz recovered just fine for me when I was using a large buffer pool. With innodb_buffer_pool_size=8m I can reproduce the hang, also when using rr record.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

data.tar.xz
2023-03-23 08:03
3.57 MB
Marko Mäkelä

Issue Links

is caused by

MDEV-30216 Read-ahead unnecessarily allocates and frees pages when a page is in the buffer pool

Closed

Multi-batch recovery of ROW_FORMAT=COMPRESSED table hangs

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration