|
marostegui, this looks trivial. In mariadb_full_bt_all_threads.txt , I do not find any occurrence of thread_routine. Hence, this ought to be a duplicate of MDEV-28665, whose fix is included in the upcoming 10.6.9 release.
Due to this bug, InnoDB would stop processing any page writes. That will eventually stop buffer pool eviction and log writes (because the log checkpoint cannot advance).
|
|
marko What's not included in the MDEV text is the startup log for the server:
Jul 19 13:38:06 db1132 mysqld[3344701]: 2022-07-19 13:38:06 0 [Warning] mysqld: io_uring_queue_init() failed with ENOMEM: try larger memory locked limit, ulimit -l, or https://mariadb.com/kb/en/systemd/#configuring-limitmemlock under systemd
|
Jul 19 13:38:06 db1132 mysqld[3344701]: 2022-07-19 13:38:06 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
|
|
|
If this hang was not actually using the io_uring interface (due to some mlock() failure on startup), like danblack told me earlier, then there is not supposed to be any thread_routine among the stacks, but the "simulated AIO" would be used. In the core dump, you should find which thread might be holding waited-for page latches, like this one:
#7 mtr_t::page_lock (this=this@entry=0x7f4cecb0fe60, block=block@entry=0x7fac2802c380, rw_latch=rw_latch@entry=1) at /root/mariadb-10.6.8/storage/innobase/mtr/mtr0mtr.cc:1240
|
fix_type = MTR_MEMO_PAGE_S_FIX
|
This block=0x7fac2802c380 (page_id=5458903433219 a.k.a. 0x4F700000003, the primary key root page of some table, possibly user_properties) is being waited for by very many threads. It is possible that this block is in the tpool queue for block writes, but it has not been processed for some reason. wlad knows those data structures better than me, so he can suggest commands for finding blocks that are waiting for writes. I only know that those blocks that are in buf_pool.flush_list with oldest_modification_>1 are waiting to be written out.
Could you also check the following?
print *(buf_page_t*)0x7fac2802c380
|
This only works if the core dump includes the buffer pool (which is not the default ever since MDEV-10814 was implemented).
|
|
Yes, the buffer_pool wasn't included . Our hosts have almost 400GB of buffer_pool configured.
|
|
Does the core dump include write_slots (defined in storage/innobase/os/os0file.cc)? That would be the queue that tpool/aio_simulated.cc should have been processing. Even though you can’t dereference IORequest::bpage, the pointer value should be there if it was in the queue.
I realized that many of the blocked threads are actually waiting for a shared lock. A write would not block them, but a read (in case the block had been evicted from the buffer pool previously) would, because the block would be exclusively latched until the read has completed. So, you should check read_slots as well.
Reads can be synchronous or asynchronous (triggered by buf_read_page_low()). I do not see anything like buf_read_page among the stack traces. It could be that an asynchronous read was scheduled, but for some reason not processed by the simulated AIO code path.
|
|
Is this what you meant? Let me know if you have more specifics things I can check for you I would be happy to run them:
(gdb) p write_slots
|
$4 = std::unique_ptr<io_slots> = {get() = 0x564d23f2efd0}
|
(gdb) p read_slots
|
$5 = std::unique_ptr<io_slots> = {get() = 0x564d23f06e80}
|
|
|
wlad, can you please help marostegui to extract the data?
|
|
print *(io_slots*)0x564d23f2efd0
|
might be a step forward.
If I remember correctly, the contents is a ring buffer, so even if we find a matching buffer pool block address there, it might belong to a removed entry.
|
|
gdb.txt
See attached.
Thanks Marko!
|
|
marko, io_slots are "unused" slots, cached io control blocks, which are borrowed for the duration of async IO.
We do not have a special registry for "used" slots, those that represent inflight IO. The premise was that OS-level AIO works, and we would not need to debug Linux kernel bugs. If that premise has changed, specifically for Linux we could do also some kind of double linked list with IOs in-flight. I would rather not generalize that to OSes where IO does not have such glaring bugs.
|
|
marostegui, I forgot that the m_userdata of each individual entry needs to be explicitly converted to IORequest. But, let me try to find the block address 0x7fac2802c380 in that format:
(gdb) p (char[8])0x7fac2802c380
|
$1 = "\200\303\002(\254\177\000"
|
There is no occurrence of the string
in gdb.txt , but the data looks otherwise rather reasonable to me.
Somewhere else, cvicentiu wrote that the file handle of the table enwiki.user_properties was 133. I do not find that in the dump, but there is m_fh = 125.
Can you also dump the read_slots? It could be more interesting.
print *(io_slots*)0x564d23f06e80
|
|
|
Yeah, what cvicentiu mentioned was:
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
|
--Type <RET> for more, q to quit, c to continue without paging--c
|
Core was generated by `/opt/wmf-mariadb106/bin/mysqld'.
|
#0 0x00007fac896ec3ff in poll () from /lib/x86_64-linux-gnu/libc.so.6
|
[Current thread is 1 (Thread 0x7fac893631c0 (LWP 3344701))]
|
(gdb) thread 11
|
[Switching to thread 11 (Thread 0x7fac808e3700 (LWP 3344713))]
|
#0 0x00007fac896f19b9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
|
(gdb) frame 6
|
#6 btr_cur_search_to_nth_level_func (index=index@entry=0x7f4c5406d500, level=level@entry=0, tuple=tuple@entry=0x7f49a9189ab0, mode=mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=33, cursor=cursor@entry=0x7fac808e05f0, ahi_latch=<optimized out>,
|
mtr=<optimized out>, autoinc=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc:2229
|
2229 /root/mariadb-10.6.8/storage/innobase/btr/btr0cur.cc: No such file or directory.
|
(gdb) p index->table->space->chain.start->handle
|
$1 = {m_file = 133, m_psi = 0x7fac8302afc0}
|
That was at: https://phabricator.wikimedia.org/P32031#134291
See attached the ouput for the requested:
print *(io_slots*)0x564d23f06e80
|
io_slots.txt
|
|
In io_slots.txt there are several occurrences of 0x7fac (little-endian: \254\177), but no match for 0x7fac2802c380. Also, there is no m_fh=133 there. I do not have any other ideas how we could find more information about that problematic block. It may still be a bug in the simulated AIO code, but I am not familiar with that code.
|
|
Theoretically, this could be the same bug as MDEV-27983, but without having access to the buffer pool contents, we will never know that. Next time such a hang occurs, it would be good idea to attach gdb to the server process and extract as much data as possible.
|
|
So we had the server from the state it got after the hang and this is what we ran:
Is there something else more specific you'd like us to get when this happens again? Or would this be enough to start with: https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#getting-full-backtraces-for-all-threads-from-a-running-mysqld-process
The "good" thing about all this is that the mysqld process doesn't seem to be crashing.
|
|
The io_slots printout says that all IOs in question (in this case read ios) have finished,. I derive that from
"m_cache = std::vector of length 1024, capacity 1024"
Explanation : Borrowing an aiocb from io_slots cache ends up in "std::vector::pop_back()" which decreases the vector length. If vector length == vector capacity, then nothing is borrowed . Not only that, also the callback function was called, because tpool::aiocb is returned to the cache within the IO callback
So, there is no read AIO pending
marko, FYI
|
|
m_fh=125, page.ibd (link to all fds) which from the PROCESS LIST doesn't show up in an active table query (nor in the more complete queries in INNODB STATUS].
SHOW ENGINE INNODB STATUS lists Pending reads 0 so at least buf_page_t::read_complete was started.
|
|
marostegui, based on what I currently know, this could be a duplicate of MDEV-27983. Coincidentally, sstamm posted there some new GDB output on the day after this bug was filed.
|
|
Previously , we established that all the backghround IOs have finished, and all post-io callbacks have finished, too.
Now there are 2 threads that are hanging, one foreground( hangs in btr_cur_search_to_nth_level_func, deep down in srw_mutex_impl<true>::wait), and one background (purge_coordinator_task, deep down in buf_page_get_low reading the page).
Why are they hanging, is mystery to me, but I think the author of srw_mutex_impl<true>::wait perhaps can find out the holder of the mutex, and perhaps what that holder does, and eventually find the place where the holder left without unlocking the synchronization primitive.
|
|
wlad, according to the information that my previous comment referred to, the number of threads waiting for each blocked page latch matches the number of waiters that the lock word knows about. In other words, it appears that each of those page latches is being held in exclusive mode. My primary suspect is that a page read request was not fully processed.
|
|
marko we've had two more issues of this and the hosts haven't been touched yet. I will be talking to danblack anyways to see if we can do some troubleshooting aside but, is there anything specific you'd like me to run/capture on those two hosts that could help you?
Thanks
|
|
marostegui, if the use of the InnoDB adaptive hash index is enabled, then MDEV-29336 might explain this.
|
|
Thanks for the fast answer, in our case, it is disabled:
root@db1132.eqiad.wmnet[(none)]> show global variables like 'innodb_adaptive_hash_index';
|
+----------------------------+-------+
|
| Variable_name | Value |
|
+----------------------------+-------+
|
| innodb_adaptive_hash_index | OFF |
|
+----------------------------+-------+
|
1 row in set (0.002 sec)
|
|
|
Per my chat with Daniel, sending here more traces gdb_db1132_24th_aug.txt
Commands run
set print static-members off
|
set print frame-arguments all
|
thread apply all backtrace full
|
thread apply all -s -- frame apply level 2 -s print/x *this
|
p *read_slots
|
p *write_slots
|
As p *read_slots and p *write_slots didn't work, I included the output for p read_slots and p write_slots
|
|
The read_slots and write_slots are wlad’s creation, and he should be able to help you dump their contents. I don’t think that the following at the end of gdb_db1132_24th_aug.txt is very helpful:
Could not find operator*.
|
Could not find operator*.
|
$46 = {_M_t = {<std::__uniq_ptr_impl<io_slots, std::default_delete<io_slots> >> = {_M_t = {<std::_Tuple_impl<0, io_slots*, std::default_delete<io_slots> >> = {<std::_Tuple_impl<1, std::default_delete<io_slots> >> = {<std::_Head_base<1, std::default_delete<io_slots>, true>> = {<std::default_delete<io_slots>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, io_slots*, false>> = {_M_head_impl = 0x55f7467b5e80}, <No data fields>}, <No data fields>}}, <No data fields>}}
|
$47 = {_M_t = {<std::__uniq_ptr_impl<io_slots, std::default_delete<io_slots> >> = {_M_t = {<std::_Tuple_impl<0, io_slots*, std::default_delete<io_slots> >> = {<std::_Tuple_impl<1, std::default_delete<io_slots> >> = {<std::_Head_base<1, std::default_delete<io_slots>, true>> = {<std::default_delete<io_slots>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, io_slots*, false>> = {_M_head_impl = 0x55f7467e9fd0}, <No data fields>}, <No data fields>}}, <No data fields>}}
|
I can identify 3 InnoDB latches in the output:
sed -ne 's/^\(#2 .*this=\)/\1/p' < thread_apply_all_backtrace |sort|uniq -c
|
The input file was the prefix of gdb_db1132_24th_aug.txt that contains the first output of the command thread apply all backtrace. The output was:
1 #2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7eff4d04bd10) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326
|
39 #2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7f22741195f8) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326
|
1 #2 ssux_lock_impl<true>::wr_wait (this=0x7f22741195f8, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378
|
1 #2 ssux_lock_impl<true>::wr_wait (this=this@entry=0x7f1d1408a538, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378
|
I omitted the output related to std::condition_variable, which supposedly is for tpool::thread_pool_generic::wait_for_tasks.
The address 0x7f22741195f8 occurs in 39+1 stack frames. Let us check each of the InnoDB latches:
Thread 53 (Thread 0x7f00148c1700 (LWP 4008608) "mysqld"):
|
#2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7eff4d04bd10) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:326
|
$22 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}}
|
The page latch is exclusively latched, and one waiter (thread 53) exists. This thread is waiting for a shared latch on page id 4131762166058 = 0x3c20037592a. There is no sign of the thread that would hold the exclusive latch. It would have helped to dump the entire contents of the page latch, including the writer member, which would identify the owning thread of the exclusive lock.
Thread 65 (Thread 0x7eff7b767700 (LWP 4008909) "mysqld"):
|
#2 ssux_lock_impl<true>::wr_wait (this=this@entry=0x7f1d1408a538, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378
|
$17 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}
|
Here we have a little more information of the lock object. The output includes the first component sux_lock::lock but not the subsequent fields. The blocked thread is attempting to lock page identifier 4131760041487 = 0x3c20016ee0f exclusively, but some other thread is apparently holding it in shared mode. In the attachment, I did not find any references to the block in other threads.
Last, let us check the most popular latch, for page identifier 4131758538755 = 0x3c200000003. That is the clustered index root page of tablespace 0x3c2 (the same tablespace that was also the subject of the other page latch conflicts). The most information for the latch contents we have in this output:
Thread 12 (Thread 0x7f5f586ac700 (LWP 2484735) "mysqld"):
|
#2 ssux_lock_impl<true>::wr_wait (this=0x7f22741195f8, lk=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/sync/srw_lock.cc:378
|
$45 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000028}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}}
|
The first component is 0x28 (40), meaning that one thread is holding an exclusive latch (or waiting for readers to exit), and 39 threads are waiting for the exclusive latch to be released. Coincidentally, there are 39 threads waiting for a shared latch, and Thread 12 is waiting for an exclusive latch to be granted. It is a mystery which thread is holding the exclusive latch that is blocking everyone else.
The following commands should have answered the question: which threads are holding the exclusive latches?
print/x *(block_lock*)0x7eff4d04bd10
|
print/x *(block_lock*)0x7f1d1408a538
|
print/x *(block_lock*)0x7f22741195f8
|
|
|
This looks very much like MDEV-27983.
|
|
Thanks Marko for the extensive review, here you are the requested commands:
(gdb) print/x *(block_lock*)0x7eff4d04bd10
|
$1 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4,
|
_M_i = 0x80000002}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = {
|
static _S_alignment = 0x4, _M_i = 0x80000000}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x1,
|
writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x7f0014ae7700}, <No data fields>},
|
static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}
|
|
(gdb) print/x *(block_lock*)0x7f1d1408a538
|
$2 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4,
|
_M_i = 0x80000001}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = {
|
static _S_alignment = 0x4, _M_i = 0x80000001}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x0,
|
writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x0}, <No data fields>},
|
static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}
|
|
(gdb) print/x *(block_lock*)0x7f22741195f8
|
$3 = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4,
|
_M_i = 0x80000028}, <No data fields>}, static HOLDER = 0x80000000}, readers = {<std::__atomic_base<unsigned int>> = {
|
static _S_alignment = 0x4, _M_i = 0x80000002}, <No data fields>}, static WRITER = 0x80000000}, recursive = 0x0,
|
writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 0x8, _M_i = 0x0}, <No data fields>},
|
static FOR_IO = 0xffffffffffffffff, static RECURSIVE_X = 0x1, static RECURSIVE_U = 0x10000, static RECURSIVE_MAX = <optimized out>}
|
|
|
marostegui, the only non-nullptr writer thread identiifer that we got is 0x7f0014ae7700. For future reference, you could have tried to identify the thread with the following:
thread find 0x7f0014ae7700
|
In gdb_db1132_24th_aug.txt , I found a match for this:
|
mariadb-10.6.8
|
Thread 72 (Thread 0x7f0014ae7700 (LWP 4009073) "mysqld"):
|
#0 0x00007f5faf4b1c61 in clock_nanosleep () from target:/lib/x86_64-linux-gnu/libc.so.6
|
No symbol table info available.
|
#1 0x00007f5faf4b7443 in nanosleep () from target:/lib/x86_64-linux-gnu/libc.so.6
|
No symbol table info available.
|
#2 0x000055f744758bbb in std::this_thread::sleep_for<long, std::ratio<1l, 1000000l> > (__rtime=<optimized out>, __rtime=<optimized out>) at /usr/include/c++/10/thread:401
|
__s = {__r = <optimized out>}
|
__ns = {__r = <optimized out>}
|
__ts = {tv_sec = 0, tv_nsec = 94044}
|
__s = {__r = <optimized out>}
|
__ns = {__r = <optimized out>}
|
__ts = {tv_sec = <optimized out>, tv_nsec = <optimized out>}
|
#3 buf_page_get_low (page_id={m_id = 4131762166058}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/buf/buf0buf.cc:2812
|
This is trying to access page 0x3c20037592a (again the same tablespace 0x3c2) while apparently holding the page latch. Let us see what is blocking it:
if (UNIV_UNLIKELY(!block->page.frame)) {
|
if (!block->page.lock.x_lock_try()) {
|
/* The page is being read or written, or
|
another thread is executing buf_zip_decompress()
|
in buf_page_get_low() on it. */
|
block->page.unfix();
|
std::this_thread::sleep_for(
|
std::chrono::microseconds(100));
|
goto loop;
|
}
|
|
buf_block_t *new_block = buf_LRU_get_free_block(false);
|
buf_block_init_low(new_block);
|
|
wait_for_unfix:
|
mysql_mutex_lock(&buf_pool.mutex);
|
page_hash_latch& hash_lock=buf_pool.page_hash.lock_get(chain);
|
|
/* It does not make sense to use
|
transactional_lock_guard here, because buf_relocate()
|
would likely make a memory transaction too large. */
|
hash_lock.lock();
|
|
/* block->page.lock implies !block->page.can_relocate() */
|
ut_ad(&block->page == buf_pool.page_hash.get(page_id, chain));
|
|
/* Wait for any other threads to release their buffer-fix
|
on the compressed-only block descriptor.
|
FIXME: Never fix() before acquiring the lock.
|
Only in buf_page_get_gen(), buf_page_get_low(), buf_page_free()
|
we are violating that principle. */
|
state = block->page.state();
|
|
switch (state) {
|
case buf_page_t::UNFIXED + 1:
|
case buf_page_t::IBUF_EXIST + 1:
|
case buf_page_t::REINIT + 1:
|
break;
|
default:
|
ut_ad(state < buf_page_t::READ_FIX);
|
|
if (state < buf_page_t::UNFIXED + 1) {
|
ut_ad(state > buf_page_t::FREED);
|
ut_ad(mode == BUF_GET_POSSIBLY_FREED
|
|| mode == BUF_PEEK_IF_IN_POOL);
|
block->page.unfix();
|
block->page.lock.x_unlock();
|
hash_lock.unlock();
|
buf_LRU_block_free_non_file_page(new_block);
|
mysql_mutex_unlock(&buf_pool.mutex);
|
return nullptr;
|
}
|
|
mysql_mutex_unlock(&buf_pool.mutex);
|
hash_lock.unlock();
|
std::this_thread::sleep_for(
|
std::chrono::microseconds(100));
|
goto wait_for_unfix;
|
}
|
The block pointers are unavailable in the output, but I found Thread 53 is waiting on the same page identifier:
|
mariadb-10.6.8
|
#6 sux_lock<ssux_lock_impl<true> >::s_lock (this=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/include/sux_lock.h:362
|
No locals.
|
#7 buf_page_get_low (page_id={m_id = 4131762166058}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /root/mariadb-10.6.8/storage/innobase/buf/buf0buf.cc:2698
|
Thread 53 is blocked waiting for a shared latch on the page, because Thread 72 is holding the exclusive page latch.
It seems possible to me that Thread 72 is waiting for Thread 53 to release its buffer-fix on the block. It would also be helpful to know the value of block->page.zip.fix.
I am guessing here, but it seems that print ((buf_page_t*)0)->lock would return 0x18. If that is the case, we’d get the entire block descriptor shown by the following:
print *(buf_block_t*)0x7eff4d04bcf8
|
This code is related to ROW_FORMAT=COMPRESSED pages. Can you confirm that the table uses ROW_FORMAT=COMPRESSED?
|
|
At least for MDEV-27983 I can confirm that we are using ROW_FORMAT=COMPRESSED
|
|
Thanks Marko.
For what is worth:
(gdb) print *(buf_block_t*)0x7eff4d04bcf8
|
$2 = {page = {id_ = {m_id = 4131762166058}, hash = 0x0, oldest_modification_ = {m = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 0}, <No data fields>}}, static NOT_USED = 0, static MEMORY = 1, static REMOVE_HASH = 2, static FREED = 3, static UNFIXED = 536870912, static IBUF_EXIST = 1073741824, static REINIT = 1610612736, static READ_FIX = 2147483648, static WRITE_FIX = 2684354560, static WRITE_FIX_IBUF = 3221225472, static WRITE_FIX_REINIT = 3758096384, static LRU_MASK = 3758096384, lock = {lock = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483650}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483648}, <No data fields>}, static WRITER = 2147483648}, recursive = 1, writer = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 139638323705600}, <No data fields>}, static FOR_IO = 18446744073709551615, static RECURSIVE_X = 1, static RECURSIVE_U = 65536, static RECURSIVE_MAX = <optimized out>}, frame = 0x0, zip = {data = 0x7f2bc3594000 "4\213p/", m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 4, fix = {m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 536870914}, <No data fields>}}}, list = {prev = 0x0, next = 0x0}, LRU = {prev = 0x7efa50be5fc8, next = 0x7f5f4c0c7880}, old = 1, freed_page_clock = 0, access_time = {m_counter = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}}}, unzip_LRU = {prev = 0x95, next = 0x0}, modify_clock = 0, n_bytes = 136, n_fields = 0, n_hash_helps = 0, left_side = false, curr_n_fields = 204, curr_n_bytes = 3842, curr_left_side = 0, index = 0x7efda684f868}
|
I cannot confirm yet if we use ROW_FORMAT=COMPRESSED yet, as I would need to "unstall" the host, and hence we'd lose the ability to get traces from it. I am pretty sure we use compressed on that host, but not 100% sure until I restart it.
|
|
Looks like the host responded after a few minutes and we do use Compressed for the majority of tables on that DB.
root@db1132.eqiad.wmnet[information_schema]> select table_name,row_format from tables where table_schema='enwiki';
|
+------------------------------+------------+
|
| table_name | row_format |
|
+------------------------------+------------+
|
| ores_model | Compressed |
|
| ores_classification | Compressed |
|
| babel | Compressed |
|
| bv2017_edits | Compressed |
|
| securepoll_strike | Compressed |
|
| page_assessments_projects | Compressed |
|
| oldimage | Compressed |
|
| mathoid | Compressed |
|
| page | Compressed |
|
| slots | Compressed |
|
| linter | Compressed |
|
| archive | Compressed |
|
| change_tag_def | Compressed |
|
| recentchanges | Compressed |
|
| templatelinks | Compressed |
|
| content_models | Compressed |
|
| job | Compressed |
|
| ip_changes | Compressed |
|
| logging | Compressed |
|
| discussiontools_subscription | Dynamic |
|
| revision | Compressed |
|
| actor | Compressed |
|
| comment | Compressed |
|
| global_block_whitelist | Compressed |
|
| externallinks | Compressed |
|
| linktarget | Dynamic |
|
| mcdc2021_edits | Dynamic |
|
| page_assessments | Compressed |
|
| image | Compressed |
|
| ipblocks | Compressed |
|
| watchlist | Compressed |
|
| watchlist_expiry | Compressed |
|
| sites | Compressed |
|
| ucoc_edits | Dynamic |
|
| content | Compressed |
|
| wikilove_log | Compressed |
|
| bv2022_edits | Dynamic |
|
| securepoll_votes | Compressed |
|
| site_identifiers | Compressed |
|
| searchindex | Compressed |
|
| flaggedrevs | Compressed |
|
| user_properties | Compressed |
|
| geo_tags | Compressed |
|
| __wmf_checksums | Compressed |
|
| abuse_filter_log | Compressed |
|
| category | Compressed |
|
| pagelinks | Compressed |
|
| optin_survey_old | Compressed |
|
| abuse_filter | Compressed |
|
| abuse_filter_action | Compressed |
|
| abuse_filter_history | Compressed |
|
| arbcom1_vote | Compressed |
|
| filearchive | Compressed |
|
| bv2009_edits | Compressed |
|
| bv2011_edits | Compressed |
|
| categorylinks | Compressed |
|
| imagelinks | Compressed |
|
| site_stats | Compressed |
|
| cu_log | Compressed |
|
| cur | Compressed |
|
| cu_changes | Compressed |
|
| page_restrictions | Compressed |
|
| langlinks | Compressed |
|
| uploadstash | Compressed |
|
| exarchive | Compressed |
|
| exrevision | Compressed |
|
| user_newtalk | Compressed |
|
| flaggedpage_config | Compressed |
|
| flaggedpage_pending | Compressed |
|
| flaggedpages | Compressed |
|
| flaggedrevs_promote | Compressed |
|
| page_props | Compressed |
|
| flaggedrevs_statistics | Compressed |
|
| flaggedrevs_tracking | Compressed |
|
| flaggedtemplates | Compressed |
|
| change_tag | Compressed |
|
| hidden | Compressed |
|
| interwiki | Compressed |
|
| l10n_cache | Compressed |
|
| securepoll_cookie_match | Compressed |
|
| bv2021_edits | Dynamic |
|
| log_search | Compressed |
|
| revision_comment_temp | Compressed |
|
| module_deps | Compressed |
|
| moodbar_feedback | Compressed |
|
| moodbar_feedback_response | Compressed |
|
| user | Compressed |
|
| querycache_info | Compressed |
|
| pagetriage_page | Compressed |
|
| pagetriage_page_tags | Compressed |
|
| pagetriage_tags | Compressed |
|
| pif_edits | Compressed |
|
| objectcache | Compressed |
|
| slot_roles | Compressed |
|
| protected_titles | Compressed |
|
| querycache | Compressed |
|
| bv2015_edits | Compressed |
|
| querycachetwo | Compressed |
|
| redirect | Compressed |
|
| securepoll_entity | Compressed |
|
| securepoll_lists | Compressed |
|
| securepoll_msgs | Compressed |
|
| securepoll_properties | Compressed |
|
| securepoll_questions | Compressed |
|
| securepoll_voters | Compressed |
|
| spoofuser | Compressed |
|
| text | Compressed |
|
| transcache | Compressed |
|
| updatelog | Compressed |
|
| user_former_groups | Compressed |
|
| user_groups | Compressed |
|
| vote_log | Compressed |
|
| iwlinks | Compressed |
|
| bv2013_edits | Compressed |
|
| securepoll_options | Compressed |
|
| wbc_entity_usage | Compressed |
|
| betafeatures_user_counts | Compressed |
|
| pagetriage_log | Compressed |
|
| transcode | Compressed |
|
| securepoll_elections | Compressed |
|
| ipblocks_restrictions | Compressed |
|
+------------------------------+------------+
|
|
|
The block->page.zip.fix contains the value 536870914, or 0x20000002 or buf_page_t::UNFIXED + 2. One of the references would be held by Thread 53 and the other by Thread 72, which are blocking each other.
The reason of the hang is clear now. There is no ongoing I/O. What apparently happened was that two threads were requesting the same ROW_FORMAT=COMPRESSED block at the same time, all while the block only exists in compressed form in the buffer pool (buf_page_t::frame contains nullptr).
This hang might have been introduced by MDEV-27058.
|
|
Thread 53 is waiting here:
if (state > buf_page_t::READ_FIX && state < buf_page_t::WRITE_FIX) {
|
if (mode == BUF_PEEK_IF_IN_POOL) {
|
ignore_block:
|
ut_ad(mode == BUF_GET_POSSIBLY_FREED
|
|| mode == BUF_PEEK_IF_IN_POOL);
|
block->unfix();
|
if (err) {
|
*err = DB_CORRUPTION;
|
}
|
return nullptr;
|
}
|
|
/* A read-fix is released after block->page.lock
|
in buf_page_t::read_complete() or
|
buf_pool_t::corrupted_evict(), or
|
after buf_zip_decompress() in this function. */
|
block->page.lock.s_lock();
|
When it started waiting, the block was read-fixed, but that is no longer the case, because we observed state == buf_page_t::UNFIXED + 2. After the read was completed, Thread 72 grabbed an exclusive page latch.
This is a little tricky, because the buffer-fix is protecting the block from relocation. After the ROW_FORMAT=COMPRESSED block was read into the buffer pool, its block descriptor must be relocated, because the block descriptors for uncompressed page frames will be preallocated. A possible fix might be that Thread 53 would invoke s_lock_try(). If that fails, it should release the buffer-fix and look up the block from buf_pool.page_hash again. In this way, Thread 72 should be able to proceed, and Thread 53 would not attempt to access a freed block descriptor.
|
|
This hang was introduced in MDEV-27058.
|
|
This is an exact duplicate of the case that was reported in MDEV-27983.
|