Here is the relevant part of the stack trace output:
mariadb-10.6.9
|
Thread 207 (Thread 0x7f1c8c0c0700 (LWP 6705)):
|
#0 0x00007f301837de9d in nanosleep () from /lib64/libpthread.so.0
|
No symbol table info available.
|
#1 0x0000559f7b818b49 in sleep_for<long, std::ratio<1, 1000000> > (__rtime=<optimized out>, __rtime=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:401
|
__ts = {tv_sec = 0, tv_nsec = 89937}
|
#2 buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) () at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2584
|
Even though it looks like the debug information for mariadbd was not installed (we can see it being installed for {{libstdc++ in frame 1 above), in this case all we need to know is the line number. This hang is a duplicate of MDEV-27983 that affects MariaDB 10.6.6, 10.6.7, 10.6.8, 10.6.9. Let me paste the code:
if (UNIV_UNLIKELY(!block->page.frame)) {
|
if (!block->page.lock.x_lock_try()) {
|
/* The page is being read or written, or
|
another thread is executing buf_zip_decompress()
|
in buf_page_get_low() on it. */
|
block->page.unfix();
|
std::this_thread::sleep_for(
|
std::chrono::microseconds(100));
|
goto loop;
|
}
|
At least one thread will remain blocked and one in an infinite loop. Among the stack traces, we have several threads waiting in buf0buf.cc:2584 (possibly on different page latches; the information is not available), at least 2 in buf0buf.cc:2536 and some in buf0buf.cc:2630. Let us check those lines as well:
/* A read-fix is released after block->page.lock
|
in buf_page_t::read_complete() or
|
buf_pool_t::corrupted_evict(), or
|
after buf_zip_decompress() in this function. */
|
2536 block->page.lock.s_lock();
|
state = block->page.state();
|
ut_ad(state < buf_page_t::READ_FIX
|
|| state >= buf_page_t::WRITE_FIX);
|
const page_id_t id{block->page.id()};
|
block->page.lock.s_unlock();
|
mysql_mutex_unlock(&buf_pool.mutex);
|
hash_lock.unlock();
|
2630 std::this_thread::sleep_for(
|
std::chrono::microseconds(100));
|
goto wait_for_unfix;
|
The fix of MDEV-27983 affects the sleeping thread (line 2536). Instead of waiting for the page latch that one of the busy-waiting threads is holding, that thread will start from the scratch, acquiring a lock on the buf_pool.page_hash cell that covers the desired page.
Here is the relevant part of the stack trace output:
mariadb-10.6.9
Thread 207 (Thread 0x7f1c8c0c0700 (LWP 6705)):
#0 0x00007f301837de9d in nanosleep () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x0000559f7b818b49 in sleep_for<long, std::ratio<1, 1000000> > (__rtime=<optimized out>, __rtime=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:401
__ts = {tv_sec = 0, tv_nsec = 89937}
#2 buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) () at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2584
Even though it looks like the debug information for mariadbd was not installed (we can see it being installed for {{libstdc++ in frame 1 above), in this case all we need to know is the line number. This hang is a duplicate of
MDEV-27983that affects MariaDB 10.6.6, 10.6.7, 10.6.8, 10.6.9. Let me paste the code:another thread is executing buf_zip_decompress()
in buf_page_get_low() on it. */
block->page.unfix();
std::this_thread::sleep_for(
std::chrono::microseconds(100));
}
At least one thread will remain blocked and one in an infinite loop. Among the stack traces, we have several threads waiting in buf0buf.cc:2584 (possibly on different page latches; the information is not available), at least 2 in buf0buf.cc:2536 and some in buf0buf.cc:2630. Let us check those lines as well:
in buf_page_t::read_complete() or
buf_pool_t::corrupted_evict(), or
after buf_zip_decompress() in this function. */
2536 block->page.lock.s_lock();
state = block->page.state();
ut_ad(state < buf_page_t::READ_FIX
|| state >= buf_page_t::WRITE_FIX);
block->page.lock.s_unlock();
mysql_mutex_unlock(&buf_pool.mutex);
hash_lock.unlock();
2630 std::this_thread::sleep_for(
std::chrono::microseconds(100));
The fix of
MDEV-27983affects the sleeping thread (line 2536). Instead of waiting for the page latch that one of the busy-waiting threads is holding, that thread will start from the scratch, acquiring a lock on the buf_pool.page_hash cell that covers the desired page.