Every ~2-7 days, around midnight, one of our SQL servers is experiencing an issue where it deadlocks near-completely. The log, as expected, just stops abruptly with no indication of what's wrong.
I can still connect using the 'root' account using a unix socket while this happens. Active queries (show processlist) seems independent.
Not the queries are deadlocking, the program itself is. No queries will process or complete as the program internally waits endlessly for mutexes.
I researched a possible cause; the most common appears to be calling unsafe functions in signal handlers.
I'm not too well versed in gdb. I don't know how to reproduce the problem (that's our entire issue). It can and does occur periodically. THe situation in [info threads] of gdb looks a bit like this:
About 300 threads stuck at `0x7f7bf8fa86c0 (LWP 684969) "mariadbd" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38`
The threads that are labeled iou-wrk- should be something for io_uring, created by the Linux kernel. Most of the syscall probably are futex system calls for InnoDB page latches. To analyze the hang or performance bottleneck, we would need complete stack traces, as explained in https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/.
Marko Mäkelä
added a comment - The threads that are labeled iou-wrk- should be something for io_uring , created by the Linux kernel. Most of the syscall probably are futex system calls for InnoDB page latches. To analyze the hang or performance bottleneck, we would need complete stack traces, as explained in https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ .
I've added a partially redacted backtrace that will hopefully be of help.
I did it the following way:
Used gcore to dump the program's memory core (~8.7GB).
Then ran gdb /usr/sbin/mariadbd core.752873 -ex 'thread apply all bt -frame-arguments all full' -ex quit |& tee backtrace.log
Then searched the file for PII (IP addresses, site names, customer names, customer numbers) and scrambled/redacted that.
I also have detailed logs... but those are gigabytes in size. So I don't think I can reasonably clean those.
I can tell the problem happened somewhere between 17:15 to 17:17 CET today now
npr
added a comment - - edited I've added a partially redacted backtrace that will hopefully be of help.
I did it the following way:
Used gcore to dump the program's memory core (~8.7GB).
Then ran gdb /usr/sbin/mariadbd core.752873 -ex 'thread apply all bt -frame-arguments all full' -ex quit |& tee backtrace.log
Then searched the file for PII (IP addresses, site names, customer names, customer numbers) and scrambled/redacted that.
I also have detailed logs... but those are gigabytes in size. So I don't think I can reasonably clean those.
I can tell the problem happened somewhere between 17:15 to 17:17 CET today now
I checked this variable, and it looks like it could be the case.
The web management panel has set it to an unrealistically low value of 128MB on a server with 32GB of physical memory allocated to it by the hypervisor. I've upped it to 8GB.
Is there a good way of stress-testing / triggering the bug? I think the reason for the inconsistency is divisibility. My hypothesis is, if these two bugs are the same (could be, version is 10.11.6 which is lower version than the 'fixed' version 10.11.9 which gives a useful error instead of a hang), that tasks running every 7, 10, 12, etc. days have 'some' but not 'all' overlap. When enough weekly/monthly/etc. tasks overlap, and cron calls the databases multiple times for wordpress tasks, enough locks are created to fill up 128MB. This then jams the database.
Allocating more memory would be a very simple solution. (Although of course a useful 'innodb: out of buffer pool memory for necessary locks' error would be most welcome once enough of the buffer pool is used up by locks so others in the future won't have to find this obscure bug report.
npr
added a comment - I checked this variable, and it looks like it could be the case.
The web management panel has set it to an unrealistically low value of 128MB on a server with 32GB of physical memory allocated to it by the hypervisor. I've upped it to 8GB.
Is there a good way of stress-testing / triggering the bug? I think the reason for the inconsistency is divisibility. My hypothesis is, if these two bugs are the same (could be, version is 10.11.6 which is lower version than the 'fixed' version 10.11.9 which gives a useful error instead of a hang), that tasks running every 7, 10, 12, etc. days have 'some' but not 'all' overlap. When enough weekly/monthly/etc. tasks overlap, and cron calls the databases multiple times for wordpress tasks, enough locks are created to fill up 128MB. This then jams the database.
Allocating more memory would be a very simple solution. (Although of course a useful 'innodb: out of buffer pool memory for necessary locks' error would be most welcome once enough of the buffer pool is used up by locks so others in the future won't have to find this obscure bug report.
#2 0x00007f569bca4efb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55880a4faf08 <buf_dblwr+104>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007f569bca7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55880a4faeb0 <buf_dblwr+16>, cond=0x55880a4faee0 <buf_dblwr+64>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x55880a4faee0 <buf_dblwr+64>, mutex=0x55880a4faeb0 <buf_dblwr+16>) at ./nptl/pthread_cond_wait.c:618
#5 0x00005588091962bc in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x55880a4faea0 <buf_dblwr>, size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:582
#6 0x000055880919676e in buf_dblwr_t::flush_buffered_writes (this=0x55880a4faea0 <buf_dblwr>) at ./storage/innobase/buf/buf0dblwr.cc:724
#7 0x000055880919e07a in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2444
The rest of the threads are blocked mostly on S and X page latches with few waiting on row locks and few on rseg latch. The Ones waiting for Free pages are in the middle of a min-transaction and is expected to block the others accessing the pages already latched. The key issue seems to be the page cleaner hang here. There doesn't seem to be any active IO threads.
Now from the page cleaner stack, we can infer
1. It is not MDEV-34166 where the page cleaner keeps looping not able to free any pages.
2. It is not MDEV-34265 / MDEV-34442 where page cleaner hangs in furious flush loop.
3. it is very similar to MDEV-33797 reported on 10.11.7.
The symptom certainly shows issues related page flushing and I/O area. Now we have fixed one critical bug in IO area MDEV-33669 which could result in such symptom but it is possible that it is a different issue altogether. This issue is filed against 10.11.8 and we already have MDEV-33669 fixed in this version. However, I got some doubt while going through the logs and line numbers. I found that the ./storage/innobase/buf/buf0buf.cc line numbers in stack are not matching with 10.11.8 source. Moving backwards, the line numbers looks to be matching 10.11.6 instead.
#90x0000558808aa9924 in buf_page_get_low (page_id={m_id = 7838315315296}, zip_size=0, rw_latch=1, guess=0x0, mode=10, mtr=0x7f567021b2b0, err=<optimized out>, allow_ibuf_merge=<optimized out>) at ./storage/innobase/buf/buf0buf.cc:2508
Hi npr Can you please help me with following information ?
1. Please confirm that the issue is repeated on 10.11.8.
2. Please provide the server error log and the hang stack trace for the same run. The log has the git commit ID for the server that is running and it would be easier to map the line numbers.
Debarun Banerjee
added a comment - I checked the attached stack trace. We have total of 144 threads.
1. 20 of them are stuck at buf_LRU_get_free_block() indicating BP is running out of free pages.
#3 0x00007f569bca7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x558809b97ec0 <buf_pool>, cond=0x558809b9c278 <buf_pool+17336>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x558809b9c278 <buf_pool+17336>, mutex=0x558809b97ec0 <buf_pool>) at ./nptl/pthread_cond_wait.c:618
#5 0x00005588091a169f in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at ./storage/innobase/buf/buf0lru.cc:460
#6 0x0000558808aaafc7 in buf_page_init_for_read (mode=mode@entry=132, page_id=page_id@entry={m_id = 7838315315296}, zip_size=zip_size@entry=0, unzip=unzip@entry=false) at ./storage/innobase/buf/buf0rea.cc:121
#7 0x00005588091a1908 in buf_read_page_low (space=0x55880aea08a8, sync=sync@entry=true, mode=mode@entry=132, page_id={m_id = 7838315315296}, zip_size=zip_size@entry=0, unzip=unzip@entry=false) at ./storage/innobase/buf/buf0rea.cc:295
2. The page cleaner thread which is supposed to generate free pages is stuck at flush while waiting for a double-write flush batch to finish.
Thread 5 (Thread 0x7f5679fff6c0 (LWP 752878)):
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55880a4faf08 <buf_dblwr+104>) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55880a4faf08 <buf_dblwr+104>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2 0x00007f569bca4efb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55880a4faf08 <buf_dblwr+104>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007f569bca7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55880a4faeb0 <buf_dblwr+16>, cond=0x55880a4faee0 <buf_dblwr+64>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x55880a4faee0 <buf_dblwr+64>, mutex=0x55880a4faeb0 <buf_dblwr+16>) at ./nptl/pthread_cond_wait.c:618
#5 0x00005588091962bc in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x55880a4faea0 <buf_dblwr>, size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:582
#6 0x000055880919676e in buf_dblwr_t::flush_buffered_writes (this=0x55880a4faea0 <buf_dblwr>) at ./storage/innobase/buf/buf0dblwr.cc:724
#7 0x000055880919e07a in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2444
The rest of the threads are blocked mostly on S and X page latches with few waiting on row locks and few on rseg latch. The Ones waiting for Free pages are in the middle of a min-transaction and is expected to block the others accessing the pages already latched. The key issue seems to be the page cleaner hang here. There doesn't seem to be any active IO threads.
Now from the page cleaner stack, we can infer
1. It is not MDEV-34166 where the page cleaner keeps looping not able to free any pages.
2. It is not MDEV-34265 / MDEV-34442 where page cleaner hangs in furious flush loop.
3. it is very similar to MDEV-33797 reported on 10.11.7.
The symptom certainly shows issues related page flushing and I/O area. Now we have fixed one critical bug in IO area MDEV-33669 which could result in such symptom but it is possible that it is a different issue altogether. This issue is filed against 10.11.8 and we already have MDEV-33669 fixed in this version. However, I got some doubt while going through the logs and line numbers. I found that the ./storage/innobase/buf/buf0buf.cc line numbers in stack are not matching with 10.11.8 source. Moving backwards, the line numbers looks to be matching 10.11.6 instead.
# 9 0x0000558808aa9924 in buf_page_get_low (page_id={m_id = 7838315315296 }, zip_size= 0 , rw_latch= 1 , guess= 0x0 , mode= 10 , mtr= 0x7f567021b2b0 , err=<optimized out>, allow_ibuf_merge=<optimized out>) at ./storage/innobase/buf/buf0buf.cc: 2508
# 10 0x00005588091928b8 in buf_page_get_gen (page_id=<optimized out>, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at ./storage/innobase/buf/buf0buf.cc: 2918
10.11.6
commit fecd78b83785d5ae96f2c6ff340375be803cd299 (HEAD -> 10.11, tag: mariadb-10.11.6)
2906 buf_page_get_gen(
2907 const page_id_t page_id,
2908 ulint zip_size,
2909 ulint rw_latch,
2910 buf_block_t* guess,
2911 ulint mode,
2912 mtr_t* mtr,
2913 dberr_t* err,
2914 bool allow_ibuf_merge)
2915 {
2916 buf_block_t *block= recv_sys.recover(page_id);
2917 if (UNIV_LIKELY(!block))
2918 return buf_page_get_low(page_id, zip_size, rw_latch,
2919 guess, mode, mtr, err, allow_ibuf_merge);
Hi npr Can you please help me with following information ?
1. Please confirm that the issue is repeated on 10.11.8.
2. Please provide the server error log and the hang stack trace for the same run. The log has the git commit ID for the server that is running and it would be easier to map the line numbers.
npr, this issue is closed, waiting for feedback for one month. If you'd still like to add a comment — please, don't hesitate to do so and we'll reopen it.
Sergei Golubchik
added a comment - npr , this issue is closed, waiting for feedback for one month. If you'd still like to add a comment — please, don't hesitate to do so and we'll reopen it.
I've just had it happen again (I don't know at all what's causing the problem, and it's quite sporadic; my changes might have made the problem rarer, but it clearly hasn't disappeared entirely).
I located the thread with innobase/buf/ functions being used
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f4cda75a6c0 (LWP 1294))]
cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2 0x00007f4efa4a4efb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x564cfe1b1f0c <buf_dblwr+108>, expected=expected@entry=0, clockid=clockid@entry=0,
abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007f4efa4a7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x564cfe1b1eb0 <buf_dblwr+16>, cond=0x564cfe1b1ee0 <buf_dblwr+64>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x564cfe1b1ee0 <buf_dblwr+64>, mutex=0x564cfe1b1eb0 <buf_dblwr+16>) at ./nptl/pthread_cond_wait.c:618
#5 0x0000564cfce4d2bc in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x564cfe1b1ea0 <buf_dblwr>, size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:582
#6 0x0000564cfce4d983 in buf_dblwr_t::add_to_batch (this=0x564cfe1b1ea0 <buf_dblwr>, request=..., size=16384) at ./storage/innobase/buf/buf0dblwr.cc:777
#7 0x0000564cfce52d86 in buf_page_t::flush (this=this@entry=0x7f4ed0018920, evict=evict@entry=false, space=space@entry=0x564cff2a9fe8) at ./storage/innobase/buf/buf0flu.cc:885
#8 0x0000564cfce532af in buf_flush_try_neighbors (space=space@entry=0x564cff2a9fe8, page_id=page_id@entry=..., bpage=0x0, bpage@entry=0x7f4ed0018880, contiguous=<optimized out>,
evict=evict@entry=false, n_flushed=n_flushed@entry=375, n_to_flush=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1104
#9 0x0000564cfce53e0d in buf_do_flush_list_batch (lsn=<optimized out>, max_n=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1500
#10 buf_flush_list_holding_mutex (lsn=<optimized out>, max_n=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1573
#11 buf_flush_list_holding_mutex (max_n=<optimized out>, lsn=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1554
#12 0x0000564cfce54eaf in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2580
#13 0x00007f4efa2d44a3 in std::execute_native_thread_routine (__p=0x564cfef7a770) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:82
#14 0x00007f4efa4a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007f4efa5287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
I can't install a wholly different version of mysql on this server. It's a production server with a proprietary management shell, so I'm loathe to mess up the packages. Not knowing at all how to trigger it, I can't install a different version on another server to find out that way. I have no idea what's triggering it. I'm updating the bug to indicate it affects version 10.11.6 (I know that for sure, at least).
I haven't managed to dump/restore things properly yet, but I do have a query which might be related to the crash.
While the error log indicates no signs of trouble, I did find that, when using gdb to increase max_connections in order to log in as root after the crash, that the first query in SHOW PROCESSLIST is precdeded by a 100kB+ one-liner full of binary data in the query log, making that a prime suspect cause. I've attached it to the report as 'keyQuery.txt'. Of course, the logging may have mangled things, so that might be my problem in difficulty to reproduce it.
(You'll need a wordpress/wordfence database to get it to work. As it's just an insert, any should do. I'll try to reproduce it with the original on a different server.)
npr
added a comment - - edited I've just had it happen again (I don't know at all what's causing the problem, and it's quite sporadic; my changes might have made the problem rarer, but it clearly hasn't disappeared entirely).
I located the thread with innobase/buf/ functions being used
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f4cda75a6c0 (LWP 1294))]
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57
57 ./nptl/futex-internal.c: No such file or directory.
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x564cfe1b1f0c <buf_dblwr+108>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0,
cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2 0x00007f4efa4a4efb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x564cfe1b1f0c <buf_dblwr+108>, expected=expected@entry=0, clockid=clockid@entry=0,
abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007f4efa4a7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x564cfe1b1eb0 <buf_dblwr+16>, cond=0x564cfe1b1ee0 <buf_dblwr+64>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x564cfe1b1ee0 <buf_dblwr+64>, mutex=0x564cfe1b1eb0 <buf_dblwr+16>) at ./nptl/pthread_cond_wait.c:618
#5 0x0000564cfce4d2bc in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x564cfe1b1ea0 <buf_dblwr>, size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:582
#6 0x0000564cfce4d983 in buf_dblwr_t::add_to_batch (this=0x564cfe1b1ea0 <buf_dblwr>, request=..., size=16384) at ./storage/innobase/buf/buf0dblwr.cc:777
#7 0x0000564cfce52d86 in buf_page_t::flush (this=this@entry=0x7f4ed0018920, evict=evict@entry=false, space=space@entry=0x564cff2a9fe8) at ./storage/innobase/buf/buf0flu.cc:885
#8 0x0000564cfce532af in buf_flush_try_neighbors (space=space@entry=0x564cff2a9fe8, page_id=page_id@entry=..., bpage=0x0, bpage@entry=0x7f4ed0018880, contiguous=<optimized out>,
evict=evict@entry=false, n_flushed=n_flushed@entry=375, n_to_flush=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1104
#9 0x0000564cfce53e0d in buf_do_flush_list_batch (lsn=<optimized out>, max_n=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1500
#10 buf_flush_list_holding_mutex (lsn=<optimized out>, max_n=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1573
#11 buf_flush_list_holding_mutex (max_n=<optimized out>, lsn=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1554
#12 0x0000564cfce54eaf in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2580
#13 0x00007f4efa2d44a3 in std::execute_native_thread_routine (__p=0x564cfef7a770) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:82
#14 0x00007f4efa4a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007f4efa5287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
I can't install a wholly different version of mysql on this server. It's a production server with a proprietary management shell, so I'm loathe to mess up the packages. Not knowing at all how to trigger it, I can't install a different version on another server to find out that way. I have no idea what's triggering it. I'm updating the bug to indicate it affects version 10.11.6 (I know that for sure, at least).
I haven't managed to dump/restore things properly yet, but I do have a query which might be related to the crash.
While the error log indicates no signs of trouble, I did find that, when using gdb to increase max_connections in order to log in as root after the crash, that the first query in SHOW PROCESSLIST is precdeded by a 100kB+ one-liner full of binary data in the query log, making that a prime suspect cause. I've attached it to the report as 'keyQuery.txt'. Of course, the logging may have mangled things, so that might be my problem in difficulty to reproduce it.
(You'll need a wordpress/wordfence database to get it to work. As it's just an insert, any should do. I'll try to reproduce it with the original on a different server.)
Last Friday, I was able to reproduce a hang by running the test innodb.recovery_memory very many times. This failure is related to buffer block allocation during early server startup, and the fix may be unrelated to the originally reported failure, but I thought that it’s better documented somewhere. The interesting thread is this one.
#5 0x00006500c181ee3d in safe_cond_timedwait (cond=0x6500c257e210 <buf_pool+17616>, mp=0x6500c2579d40 <buf_pool>, abstime=0x7fffbe20b170,
file=0x6500c1bc1cd8 <error: Cannot access memory at address 0x6500c1bc1cd8>, line=460) at /home/marko/11.2/mysys/thr_mutex.c:543
#6 0x00006500c16fdca5 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /home/marko/11.2/storage/innobase/buf/buf0lru.cc:460
#7 0x00006500c0dece67 in buf_page_init_for_read (mode=mode@entry=132, page_id=..., page_id@entry=..., zip_size=zip_size@entry=0, unzip=<error reading variable: Cannot access memory at address 0x7fffbe20b800>)
at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:121
#8 0x00006500c16fe2be in buf_read_page_low (space=<error reading variable: Cannot access memory at address 0x7fffbe20b818>, sync=sync@entry=true, mode=mode@entry=132, page_id=..., zip_size=0,
unzip=unzip@entry=true) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:296
#9 0x00006500c16fed8c in buf_read_page (page_id=...) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:466
allow_ibuf_merge=<optimized out>) at /home/marko/11.2/storage/innobase/buf/buf0buf.cc:3359
#12 0x00006500c16930ea in trx_rseg_mem_restore (mtr=0x7fffbe20bcc0, rseg=0x6500c2579380 <trx_sys+48192>) at /home/marko/11.2/storage/innobase/trx/trx0rseg.cc:466
#13 trx_rseg_array_init () at /home/marko/11.2/storage/innobase/trx/trx0rseg.cc:668
#14 0x00006500c169bdc4 in trx_lists_init_at_db_start () at /home/marko/11.2/storage/innobase/trx/trx0trx.cc:722
The buf_flush_page_cleaner() thread was in pthread_cond_timedwait(). Unfortunately, the server process was killed by the test environment while I was trying to debug it.
This hang only occurred in one out of about 100 runs.
Marko Mäkelä
added a comment - Last Friday, I was able to reproduce a hang by running the test innodb.recovery_memory very many times. This failure is related to buffer block allocation during early server startup, and the fix may be unrelated to the originally reported failure, but I thought that it’s better documented somewhere. The interesting thread is this one.
#5 0x00006500c181ee3d in safe_cond_timedwait (cond=0x6500c257e210 <buf_pool+17616>, mp=0x6500c2579d40 <buf_pool>, abstime=0x7fffbe20b170,
file=0x6500c1bc1cd8 <error: Cannot access memory at address 0x6500c1bc1cd8>, line=460) at /home/marko/11.2/mysys/thr_mutex.c:543
#6 0x00006500c16fdca5 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /home/marko/11.2/storage/innobase/buf/buf0lru.cc:460
#7 0x00006500c0dece67 in buf_page_init_for_read (mode=mode@entry=132, page_id=..., page_id@entry=..., zip_size=zip_size@entry=0, unzip=<error reading variable: Cannot access memory at address 0x7fffbe20b800>)
at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:121
#8 0x00006500c16fe2be in buf_read_page_low (space=<error reading variable: Cannot access memory at address 0x7fffbe20b818>, sync=sync@entry=true, mode=mode@entry=132, page_id=..., zip_size=0,
unzip=unzip@entry=true) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:296
#9 0x00006500c16fed8c in buf_read_page (page_id=...) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:466
#10 0x00006500c0debd54 in buf_page_get_low (page_id=..., zip_size=0, rw_latch=RW_S_LATCH, guess=0x0, mode=10, mtr=0x7fffbe20bcc0, err=0x7fffbe20bb7c, allow_ibuf_merge=false)
at /home/marko/11.2/storage/innobase/buf/buf0buf.cc:3100
#11 0x00006500c16e5561 in buf_page_get_gen (page_id=..., zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>,
allow_ibuf_merge=<optimized out>) at /home/marko/11.2/storage/innobase/buf/buf0buf.cc:3359
#12 0x00006500c16930ea in trx_rseg_mem_restore (mtr=0x7fffbe20bcc0, rseg=0x6500c2579380 <trx_sys+48192>) at /home/marko/11.2/storage/innobase/trx/trx0rseg.cc:466
#13 trx_rseg_array_init () at /home/marko/11.2/storage/innobase/trx/trx0rseg.cc:668
#14 0x00006500c169bdc4 in trx_lists_init_at_db_start () at /home/marko/11.2/storage/innobase/trx/trx0trx.cc:722
The buf_flush_page_cleaner() thread was in pthread_cond_timedwait() . Unfortunately, the server process was killed by the test environment while I was trying to debug it.
This hang only occurred in one out of about 100 runs.
npr, the wait in buf_dblwr_t::flush_buffered_writes() reminds me of MDEV-33820 and MDEV-34641. Is it possible that the system time could have been adjusted? Based on the presence of iou-wrk threads, the io_uring interface is being used. There is a suspicion is that this subsystem is not immune to the time moving backwards. The Linux kernel version might be relevant.
Marko Mäkelä
added a comment - - edited npr , the wait in buf_dblwr_t::flush_buffered_writes() reminds me of MDEV-33820 and MDEV-34641 . Is it possible that the system time could have been adjusted? Based on the presence of iou-wrk threads, the io_uring interface is being used. There is a suspicion is that this subsystem is not immune to the time moving backwards. The Linux kernel version might be relevant.
I checked, and while the server is running the standard debian time sync, it should not be exhibiting this problem as vmware-tools timesync is disabled, and as far as I can tell the poll interval is rather large, so multiple time adjustments within the time it takes to run a query should be impossible;
> vmware-toolbox-cmd timesync status
< Disabled
> timedatectl status | grep"Poll interval"
< Poll interval: 34min 8s (min: 32s; max 34min 8s)
npr
added a comment - I checked, and while the server is running the standard debian time sync, it should not be exhibiting this problem as vmware-tools timesync is disabled, and as far as I can tell the poll interval is rather large, so multiple time adjustments within the time it takes to run a query should be impossible;
> vmware-toolbox-cmd timesync status
< Disabled
> timedatectl status | grep "Poll interval"
< Poll interval: 34min 8s (min: 32s; max 34min 8s)
> uname -a
< Linux s05 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux
Looking into things a bit further, the key thread you seem to identify is hanging on a condition value that never becomes true.
I've read that if a pthread condition is 'deleted' then undefined behaviour could result. The debugger reports the following info when dumping this variable;
npr
added a comment - Looking into things a bit further, the key thread you seem to identify is hanging on a condition value that never becomes true.
I've read that if a pthread condition is 'deleted' then undefined behaviour could result. The debugger reports the following info when dumping this variable;
(gdb) print *cond
$5 = {__data = {__wseq = {__value64 = 8709, __value32 = {__low = 8709, __high = 0}}, __g1_start = {__value64 = 8705, __value32 = {__low = 8705, __high = 0}}, __g_refs = {0, 2},
__g_size = {0, 0}, __g1_orig_size = 4, __wrefs = 8, __g_signals = {0, 0}},
__size = "\005\"\000\000\000\000\000\000\001\"\000\000\000\000\000\000\000\000\000\000\002", '\000' <repeats 11 times>, "\004\000\000\000\b\000\000\000\000\000\000\000\000\000\000",
__align = 8709}
npr, the condition variable buf_dblwr.cond is allocated statically. The issue could be that a pending write request is lost. In that case, write_slots->m_cache.m_pos should not be 0. What is it in your case?
Back in the Linux kernel 5.11 or thereabouts, we ran into a problem that io_uring would sometimes lose write requests. I wouldn’t expect that the 6.1 series would be affected by that.
Marko Mäkelä
added a comment - - edited npr , the condition variable buf_dblwr.cond is allocated statically. The issue could be that a pending write request is lost. In that case, write_slots->m_cache.m_pos should not be 0. What is it in your case?
Back in the Linux kernel 5.11 or thereabouts, we ran into a problem that io_uring would sometimes lose write requests. I wouldn’t expect that the 6.1 series would be affected by that.
57 ./nptl/futex-internal.c: No such file or directory.
(gdb) print write_slots->m_cache.m_pos
$7 = 125
Looks like it is indeed nonzero.
For completeness, this machine is using a bog-standard, well tested filesystem; ext4 with 4kB blocks. There's no drive encryption, though there is VMware virturalization:
npr
added a comment - - edited
(gdb) print write_slots->m_cache.m_pos
$6 = 125
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f4cda75a6c0 (LWP 1294))]
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57
57 ./nptl/futex-internal.c: No such file or directory.
(gdb) print write_slots->m_cache.m_pos
$7 = 125
Looks like it is indeed nonzero.
For completeness, this machine is using a bog-standard, well tested filesystem; ext4 with 4kB blocks. There's no drive encryption, though there is VMware virturalization:
root@s05:~ # blkid
/dev/sda5 : UUID= "a28215bb-dc64-468e-9f04-b4c96f1ea552" TYPE= "swap" PARTUUID= "26c5a724-05"
/dev/sda1 : UUID= "191e1da6-0127-4b30-b6c5-38e2a874e64d" BLOCK_SIZE= "4096" TYPE= "ext4" PARTUUID= "26c5a724-01"
npr, if you are using MariaDB Server 10.11.6, I think that this report should be a duplicate of MDEV-33669 or MDEV-32861, which were fixed in 10.11.8 and 10.11.7. I would recommend upgrading to 10.11.9 while you are at it.
Marko Mäkelä
added a comment - npr , if you are using MariaDB Server 10.11.6, I think that this report should be a duplicate of MDEV-33669 or MDEV-32861 , which were fixed in 10.11.8 and 10.11.7. I would recommend upgrading to 10.11.9 while you are at it.
The threads that are labeled iou-wrk- should be something for io_uring, created by the Linux kernel. Most of the syscall probably are futex system calls for InnoDB page latches. To analyze the hang or performance bottleneck, we would need complete stack traces, as explained in https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/.