[MDEV-34445] Rare Futex deadlocks - Jira

npr created issue - 2024-06-24 08:52

Marko Mäkelä added a comment - 2024-06-24 09:12

The threads that are labeled iou-wrk- should be something for io_uring, created by the Linux kernel. Most of the syscall probably are futex system calls for InnoDB page latches. To analyze the hang or performance bottleneck, we would need complete stack traces, as explained in https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/.

Marko Mäkelä added a comment - 2024-06-24 09:12 The threads that are labeled iou-wrk- should be something for io_uring , created by the Linux kernel. Most of the syscall probably are futex system calls for InnoDB page latches. To analyze the hang or performance bottleneck, we would need complete stack traces, as explained in https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ .

Sergei Golubchik made changes - 2024-06-24 11:37

Field	Original Value	New Value
Status	Open [ 1 ]	Needs Feedback [ 10501 ]

Marko Mäkelä added a comment - 2024-06-24 14:40

You should check also other reported hangs, such as ~~MDEV-34442~~.

Marko Mäkelä added a comment - 2024-06-24 14:40 You should check also other reported hangs, such as MDEV-34442 .

npr made changes - 2024-07-11 16:33

Attachment

backtrace.log [ 73840 ]

npr added a comment - 2024-07-11 16:58 - edited

I've added a partially redacted backtrace that will hopefully be of help.

I did it the following way:

Used gcore to dump the program's memory core (~8.7GB).
Then ran gdb /usr/sbin/mariadbd core.752873 -ex 'thread apply all bt -frame-arguments all full' -ex quit |& tee backtrace.log
Then searched the file for PII (IP addresses, site names, customer names, customer numbers) and scrambled/redacted that.

I also have detailed logs... but those are gigabytes in size. So I don't think I can reasonably clean those.
I can tell the problem happened somewhere between 17:15 to 17:17 CET today now

npr added a comment - 2024-07-11 16:58 - edited I've added a partially redacted backtrace that will hopefully be of help. I did it the following way: Used gcore to dump the program's memory core (~8.7GB). Then ran gdb /usr/sbin/mariadbd core.752873 -ex 'thread apply all bt -frame-arguments all full' -ex quit |& tee backtrace.log Then searched the file for PII (IP addresses, site names, customer names, customer numbers) and scrambled/redacted that. I also have detailed logs... but those are gigabytes in size. So I don't think I can reasonably clean those. I can tell the problem happened somewhere between 17:15 to 17:17 CET today now

Marko Mäkelä added a comment - 2024-07-14 07:17

This looks possibly related to ~~MDEV-34166~~.

Marko Mäkelä added a comment - 2024-07-14 07:17 This looks possibly related to MDEV-34166 .

Marko Mäkelä made changes - 2024-07-14 07:17

Assignee

Debarun Banerjee [ JIRAUSER54513 ]

Marko Mäkelä made changes - 2024-07-14 07:18

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

npr added a comment - 2024-07-15 07:36

I checked this variable, and it looks like it could be the case.

The web management panel has set it to an unrealistically low value of 128MB on a server with 32GB of physical memory allocated to it by the hypervisor. I've upped it to 8GB.

Is there a good way of stress-testing / triggering the bug? I think the reason for the inconsistency is divisibility. My hypothesis is, if these two bugs are the same (could be, version is 10.11.6 which is lower version than the 'fixed' version 10.11.9 which gives a useful error instead of a hang), that tasks running every 7, 10, 12, etc. days have 'some' but not 'all' overlap. When enough weekly/monthly/etc. tasks overlap, and cron calls the databases multiple times for wordpress tasks, enough locks are created to fill up 128MB. This then jams the database.

Allocating more memory would be a very simple solution. (Although of course a useful 'innodb: out of buffer pool memory for necessary locks' error would be most welcome once enough of the buffer pool is used up by locks so others in the future won't have to find this obscure bug report.

npr added a comment - 2024-07-15 07:36 I checked this variable, and it looks like it could be the case. The web management panel has set it to an unrealistically low value of 128MB on a server with 32GB of physical memory allocated to it by the hypervisor. I've upped it to 8GB. Is there a good way of stress-testing / triggering the bug? I think the reason for the inconsistency is divisibility. My hypothesis is, if these two bugs are the same (could be, version is 10.11.6 which is lower version than the 'fixed' version 10.11.9 which gives a useful error instead of a hang), that tasks running every 7, 10, 12, etc. days have 'some' but not 'all' overlap. When enough weekly/monthly/etc. tasks overlap, and cron calls the databases multiple times for wordpress tasks, enough locks are created to fill up 128MB. This then jams the database. Allocating more memory would be a very simple solution. (Although of course a useful 'innodb: out of buffer pool memory for necessary locks' error would be most welcome once enough of the buffer pool is used up by locks so others in the future won't have to find this obscure bug report.

Debarun Banerjee added a comment - 2024-07-15 13:41

I checked the attached stack trace. We have total of 144 threads.

1. 20 of them are stuck at buf_LRU_get_free_block() indicating BP is running out of free pages.

#3  0x00007f569bca7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x558809b97ec0 <buf_pool>, cond=0x558809b9c278 <buf_pool+17336>) at ./nptl/pthread_cond_wait.c:503

#4  ___pthread_cond_wait (cond=0x558809b9c278 <buf_pool+17336>, mutex=0x558809b97ec0 <buf_pool>) at ./nptl/pthread_cond_wait.c:618

#5  0x00005588091a169f in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at ./storage/innobase/buf/buf0lru.cc:460

#6  0x0000558808aaafc7 in buf_page_init_for_read (mode=mode@entry=132, page_id=page_id@entry={m_id = 7838315315296}, zip_size=zip_size@entry=0, unzip=unzip@entry=false) at ./storage/innobase/buf/buf0rea.cc:121

#7  0x00005588091a1908 in buf_read_page_low (space=0x55880aea08a8, sync=sync@entry=true, mode=mode@entry=132, page_id={m_id = 7838315315296}, zip_size=zip_size@entry=0, unzip=unzip@entry=false) at ./storage/innobase/buf/buf0rea.cc:295

2. The page cleaner thread which is supposed to generate free pages is stuck at flush while waiting for a double-write flush batch to finish.

Thread 5 (Thread 0x7f5679fff6c0 (LWP 752878)):

#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55880a4faf08 <buf_dblwr+104>) at ./nptl/futex-internal.c:57

#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55880a4faf08 <buf_dblwr+104>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87

#2  0x00007f569bca4efb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55880a4faf08 <buf_dblwr+104>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139

#3  0x00007f569bca7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55880a4faeb0 <buf_dblwr+16>, cond=0x55880a4faee0 <buf_dblwr+64>) at ./nptl/pthread_cond_wait.c:503

#4  ___pthread_cond_wait (cond=0x55880a4faee0 <buf_dblwr+64>, mutex=0x55880a4faeb0 <buf_dblwr+16>) at ./nptl/pthread_cond_wait.c:618

#5  0x00005588091962bc in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x55880a4faea0 <buf_dblwr>, size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:582

#6  0x000055880919676e in buf_dblwr_t::flush_buffered_writes (this=0x55880a4faea0 <buf_dblwr>) at ./storage/innobase/buf/buf0dblwr.cc:724

#7  0x000055880919e07a in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2444

The rest of the threads are blocked mostly on S and X page latches with few waiting on row locks and few on rseg latch. The Ones waiting for Free pages are in the middle of a min-transaction and is expected to block the others accessing the pages already latched. The key issue seems to be the page cleaner hang here. There doesn't seem to be any active IO threads.

Now from the page cleaner stack, we can infer
1. It is not ~~MDEV-34166~~ where the page cleaner keeps looping not able to free any pages.
2. It is not ~~MDEV-34265~~ / ~~MDEV-34442~~ where page cleaner hangs in furious flush loop.
3. it is very similar to ~~MDEV-33797~~ reported on 10.11.7.

The symptom certainly shows issues related page flushing and I/O area. Now we have fixed one critical bug in IO area ~~MDEV-33669~~ which could result in such symptom but it is possible that it is a different issue altogether. This issue is filed against 10.11.8 and we already have ~~MDEV-33669~~ fixed in this version. However, I got some doubt while going through the logs and line numbers. I found that the ./storage/innobase/buf/buf0buf.cc line numbers in stack are not matching with 10.11.8 source. Moving backwards, the line numbers looks to be matching 10.11.6 instead.

#9  0x0000558808aa9924 in buf_page_get_low (page_id={m_id = 7838315315296}, zip_size=0, rw_latch=1, guess=0x0, mode=10, mtr=0x7f567021b2b0, err=<optimized out>, allow_ibuf_merge=<optimized out>) at ./storage/innobase/buf/buf0buf.cc:2508

#10 0x00005588091928b8 in buf_page_get_gen (page_id=<optimized out>, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at ./storage/innobase/buf/buf0buf.cc:2918

10.11.6
commit fecd78b83785d5ae96f2c6ff340375be803cd299 (HEAD -> 10.11, tag: mariadb-10.11.6)

2906 buf_page_get_gen(

2907         const page_id_t         page_id,

2908         ulint                   zip_size,

2909         ulint                   rw_latch,

2910         buf_block_t*            guess,

2911         ulint                   mode,

2912         mtr_t*                  mtr,

2913         dberr_t*                err,

2914         bool                    allow_ibuf_merge)

2915 {

2916   buf_block_t *block= recv_sys.recover(page_id);

2917   if (UNIV_LIKELY(!block))

2918     return buf_page_get_low(page_id, zip_size, rw_latch,

2919                             guess, mode, mtr, err, allow_ibuf_merge);

Hi npr Can you please help me with following information ?

1. Please confirm that the issue is repeated on 10.11.8.
2. Please provide the server error log and the hang stack trace for the same run. The log has the git commit ID for the server that is running and it would be easier to map the line numbers.

Debarun Banerjee added a comment - 2024-07-15 13:41 I checked the attached stack trace. We have total of 144 threads. 1. 20 of them are stuck at buf_LRU_get_free_block() indicating BP is running out of free pages. #3 0x00007f569bca7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x558809b97ec0 <buf_pool>, cond=0x558809b9c278 <buf_pool+17336>) at ./nptl/pthread_cond_wait.c:503 #4 ___pthread_cond_wait (cond=0x558809b9c278 <buf_pool+17336>, mutex=0x558809b97ec0 <buf_pool>) at ./nptl/pthread_cond_wait.c:618 #5 0x00005588091a169f in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at ./storage/innobase/buf/buf0lru.cc:460 #6 0x0000558808aaafc7 in buf_page_init_for_read (mode=mode@entry=132, page_id=page_id@entry={m_id = 7838315315296}, zip_size=zip_size@entry=0, unzip=unzip@entry=false) at ./storage/innobase/buf/buf0rea.cc:121 #7 0x00005588091a1908 in buf_read_page_low (space=0x55880aea08a8, sync=sync@entry=true, mode=mode@entry=132, page_id={m_id = 7838315315296}, zip_size=zip_size@entry=0, unzip=unzip@entry=false) at ./storage/innobase/buf/buf0rea.cc:295 2. The page cleaner thread which is supposed to generate free pages is stuck at flush while waiting for a double-write flush batch to finish. Thread 5 (Thread 0x7f5679fff6c0 (LWP 752878)): #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55880a4faf08 <buf_dblwr+104>) at ./nptl/futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55880a4faf08 <buf_dblwr+104>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87 #2 0x00007f569bca4efb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55880a4faf08 <buf_dblwr+104>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139 #3 0x00007f569bca7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55880a4faeb0 <buf_dblwr+16>, cond=0x55880a4faee0 <buf_dblwr+64>) at ./nptl/pthread_cond_wait.c:503 #4 ___pthread_cond_wait (cond=0x55880a4faee0 <buf_dblwr+64>, mutex=0x55880a4faeb0 <buf_dblwr+16>) at ./nptl/pthread_cond_wait.c:618 #5 0x00005588091962bc in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x55880a4faea0 <buf_dblwr>, size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:582 #6 0x000055880919676e in buf_dblwr_t::flush_buffered_writes (this=0x55880a4faea0 <buf_dblwr>) at ./storage/innobase/buf/buf0dblwr.cc:724 #7 0x000055880919e07a in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2444 The rest of the threads are blocked mostly on S and X page latches with few waiting on row locks and few on rseg latch. The Ones waiting for Free pages are in the middle of a min-transaction and is expected to block the others accessing the pages already latched. The key issue seems to be the page cleaner hang here. There doesn't seem to be any active IO threads. Now from the page cleaner stack, we can infer 1. It is not MDEV-34166 where the page cleaner keeps looping not able to free any pages. 2. It is not MDEV-34265 / MDEV-34442 where page cleaner hangs in furious flush loop. 3. it is very similar to MDEV-33797 reported on 10.11.7. The symptom certainly shows issues related page flushing and I/O area. Now we have fixed one critical bug in IO area MDEV-33669 which could result in such symptom but it is possible that it is a different issue altogether. This issue is filed against 10.11.8 and we already have MDEV-33669 fixed in this version. However, I got some doubt while going through the logs and line numbers. I found that the ./storage/innobase/buf/buf0buf.cc line numbers in stack are not matching with 10.11.8 source. Moving backwards, the line numbers looks to be matching 10.11.6 instead. # 9 0x0000558808aa9924 in buf_page_get_low (page_id={m_id = 7838315315296 }, zip_size= 0 , rw_latch= 1 , guess= 0x0 , mode= 10 , mtr= 0x7f567021b2b0 , err=<optimized out>, allow_ibuf_merge=<optimized out>) at ./storage/innobase/buf/buf0buf.cc: 2508 # 10 0x00005588091928b8 in buf_page_get_gen (page_id=<optimized out>, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at ./storage/innobase/buf/buf0buf.cc: 2918 10.11.6 commit fecd78b83785d5ae96f2c6ff340375be803cd299 (HEAD -> 10.11, tag: mariadb-10.11.6) 2906 buf_page_get_gen( 2907 const page_id_t page_id, 2908 ulint zip_size, 2909 ulint rw_latch, 2910 buf_block_t* guess, 2911 ulint mode, 2912 mtr_t* mtr, 2913 dberr_t* err, 2914 bool allow_ibuf_merge) 2915 { 2916 buf_block_t *block= recv_sys.recover(page_id); 2917 if (UNIV_LIKELY(!block)) 2918 return buf_page_get_low(page_id, zip_size, rw_latch, 2919 guess, mode, mtr, err, allow_ibuf_merge); Hi npr Can you please help me with following information ? 1. Please confirm that the issue is repeated on 10.11.8. 2. Please provide the server error log and the hang stack trace for the same run. The log has the git commit ID for the server that is running and it would be easier to map the line numbers.

Debarun Banerjee made changes - 2024-07-29 05:54

Status

Open [ 1 ]

Needs Feedback [ 10501 ]

Sergei Golubchik added a comment - 2024-08-26 18:43

npr, this issue is closed, waiting for feedback for one month. If you'd still like to add a comment — please, don't hesitate to do so and we'll reopen it.

Sergei Golubchik added a comment - 2024-08-26 18:43 npr , this issue is closed, waiting for feedback for one month. If you'd still like to add a comment — please, don't hesitate to do so and we'll reopen it.

Sergei Golubchik made changes - 2024-08-26 18:43

Fix Version/s		N/A [ 14700 ]
Resolution		Incomplete [ 4 ]
Status	Needs Feedback [ 10501 ]	Closed [ 6 ]

npr made changes - 2024-10-08 08:30

Attachment

keyQuery.txt [ 74109 ]

npr made changes - 2024-10-08 08:37

Affects Version/s		10.11.6 [ 29020 ]
Affects Version/s	10.11.8 [ 29630 ]

npr added a comment - 2024-10-08 09:51 - edited

I've just had it happen again (I don't know at all what's causing the problem, and it's quite sporadic; my changes might have made the problem rarer, but it clearly hasn't disappeared entirely).

I located the thread with innobase/buf/ functions being used

(gdb) thread 5

[Switching to thread 5 (Thread 0x7f4cda75a6c0 (LWP 1294))]

#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57

57      ./nptl/futex-internal.c: No such file or directory.

(gdb) bt

#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57

#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x564cfe1b1f0c <buf_dblwr+108>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0,

    cancel=cancel@entry=true) at ./nptl/futex-internal.c:87

#2  0x00007f4efa4a4efb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x564cfe1b1f0c <buf_dblwr+108>, expected=expected@entry=0, clockid=clockid@entry=0,

    abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139

#3  0x00007f4efa4a7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x564cfe1b1eb0 <buf_dblwr+16>, cond=0x564cfe1b1ee0 <buf_dblwr+64>) at ./nptl/pthread_cond_wait.c:503

#4  ___pthread_cond_wait (cond=0x564cfe1b1ee0 <buf_dblwr+64>, mutex=0x564cfe1b1eb0 <buf_dblwr+16>) at ./nptl/pthread_cond_wait.c:618

#5  0x0000564cfce4d2bc in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x564cfe1b1ea0 <buf_dblwr>, size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:582

#6  0x0000564cfce4d983 in buf_dblwr_t::add_to_batch (this=0x564cfe1b1ea0 <buf_dblwr>, request=..., size=16384) at ./storage/innobase/buf/buf0dblwr.cc:777

#7  0x0000564cfce52d86 in buf_page_t::flush (this=this@entry=0x7f4ed0018920, evict=evict@entry=false, space=space@entry=0x564cff2a9fe8) at ./storage/innobase/buf/buf0flu.cc:885

#8  0x0000564cfce532af in buf_flush_try_neighbors (space=space@entry=0x564cff2a9fe8, page_id=page_id@entry=..., bpage=0x0, bpage@entry=0x7f4ed0018880, contiguous=<optimized out>,

    evict=evict@entry=false, n_flushed=n_flushed@entry=375, n_to_flush=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1104

#9  0x0000564cfce53e0d in buf_do_flush_list_batch (lsn=<optimized out>, max_n=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1500

#10 buf_flush_list_holding_mutex (lsn=<optimized out>, max_n=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1573

#11 buf_flush_list_holding_mutex (max_n=<optimized out>, lsn=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1554

#12 0x0000564cfce54eaf in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2580

#13 0x00007f4efa2d44a3 in std::execute_native_thread_routine (__p=0x564cfef7a770) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:82

#14 0x00007f4efa4a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442

#15 0x00007f4efa5287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I can't install a wholly different version of mysql on this server. It's a production server with a proprietary management shell, so I'm loathe to mess up the packages. Not knowing at all how to trigger it, I can't install a different version on another server to find out that way. I have no idea what's triggering it. I'm updating the bug to indicate it affects version 10.11.6 (I know that for sure, at least).

I haven't managed to dump/restore things properly yet, but I do have a query which might be related to the crash.
While the error log indicates no signs of trouble, I did find that, when using gdb to increase max_connections in order to log in as root after the crash, that the first query in SHOW PROCESSLIST is precdeded by a 100kB+ one-liner full of binary data in the query log, making that a prime suspect cause. I've attached it to the report as 'keyQuery.txt'. Of course, the logging may have mangled things, so that might be my problem in difficulty to reproduce it.

(You'll need a wordpress/wordfence database to get it to work. As it's just an insert, any should do. I'll try to reproduce it with the original on a different server.)

npr added a comment - 2024-10-08 09:51 - edited I've just had it happen again (I don't know at all what's causing the problem, and it's quite sporadic; my changes might have made the problem rarer, but it clearly hasn't disappeared entirely). I located the thread with innobase/buf/ functions being used (gdb) thread 5 [Switching to thread 5 (Thread 0x7f4cda75a6c0 (LWP 1294))] #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57 57 ./nptl/futex-internal.c: No such file or directory. (gdb) bt #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x564cfe1b1f0c <buf_dblwr+108>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87 #2 0x00007f4efa4a4efb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x564cfe1b1f0c <buf_dblwr+108>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139 #3 0x00007f4efa4a7558 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x564cfe1b1eb0 <buf_dblwr+16>, cond=0x564cfe1b1ee0 <buf_dblwr+64>) at ./nptl/pthread_cond_wait.c:503 #4 ___pthread_cond_wait (cond=0x564cfe1b1ee0 <buf_dblwr+64>, mutex=0x564cfe1b1eb0 <buf_dblwr+16>) at ./nptl/pthread_cond_wait.c:618 #5 0x0000564cfce4d2bc in buf_dblwr_t::flush_buffered_writes (this=this@entry=0x564cfe1b1ea0 <buf_dblwr>, size=size@entry=64) at ./storage/innobase/buf/buf0dblwr.cc:582 #6 0x0000564cfce4d983 in buf_dblwr_t::add_to_batch (this=0x564cfe1b1ea0 <buf_dblwr>, request=..., size=16384) at ./storage/innobase/buf/buf0dblwr.cc:777 #7 0x0000564cfce52d86 in buf_page_t::flush (this=this@entry=0x7f4ed0018920, evict=evict@entry=false, space=space@entry=0x564cff2a9fe8) at ./storage/innobase/buf/buf0flu.cc:885 #8 0x0000564cfce532af in buf_flush_try_neighbors (space=space@entry=0x564cff2a9fe8, page_id=page_id@entry=..., bpage=0x0, bpage@entry=0x7f4ed0018880, contiguous=<optimized out>, evict=evict@entry=false, n_flushed=n_flushed@entry=375, n_to_flush=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1104 #9 0x0000564cfce53e0d in buf_do_flush_list_batch (lsn=<optimized out>, max_n=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1500 #10 buf_flush_list_holding_mutex (lsn=<optimized out>, max_n=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1573 #11 buf_flush_list_holding_mutex (max_n=<optimized out>, lsn=<optimized out>) at ./storage/innobase/buf/buf0flu.cc:1554 #12 0x0000564cfce54eaf in buf_flush_page_cleaner () at ./storage/innobase/buf/buf0flu.cc:2580 #13 0x00007f4efa2d44a3 in std::execute_native_thread_routine (__p=0x564cfef7a770) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:82 #14 0x00007f4efa4a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #15 0x00007f4efa5287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 I can't install a wholly different version of mysql on this server. It's a production server with a proprietary management shell, so I'm loathe to mess up the packages. Not knowing at all how to trigger it, I can't install a different version on another server to find out that way. I have no idea what's triggering it. I'm updating the bug to indicate it affects version 10.11.6 (I know that for sure, at least). I haven't managed to dump/restore things properly yet, but I do have a query which might be related to the crash. While the error log indicates no signs of trouble, I did find that, when using gdb to increase max_connections in order to log in as root after the crash, that the first query in SHOW PROCESSLIST is precdeded by a 100kB+ one-liner full of binary data in the query log, making that a prime suspect cause. I've attached it to the report as 'keyQuery.txt'. Of course, the logging may have mangled things, so that might be my problem in difficulty to reproduce it. (You'll need a wordpress/wordfence database to get it to work. As it's just an insert, any should do. I'll try to reproduce it with the original on a different server.)

Marko Mäkelä added a comment - 2024-10-08 10:18

Last Friday, I was able to reproduce a hang by running the test innodb.recovery_memory very many times. This failure is related to buffer block allocation during early server startup, and the fix may be unrelated to the originally reported failure, but I thought that it’s better documented somewhere. The interesting thread is this one.

#5  0x00006500c181ee3d in safe_cond_timedwait (cond=0x6500c257e210 <buf_pool+17616>, mp=0x6500c2579d40 <buf_pool>, abstime=0x7fffbe20b170,

    file=0x6500c1bc1cd8 <error: Cannot access memory at address 0x6500c1bc1cd8>, line=460) at /home/marko/11.2/mysys/thr_mutex.c:543

#6  0x00006500c16fdca5 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /home/marko/11.2/storage/innobase/buf/buf0lru.cc:460

#7  0x00006500c0dece67 in buf_page_init_for_read (mode=mode@entry=132, page_id=..., page_id@entry=..., zip_size=zip_size@entry=0, unzip=<error reading variable: Cannot access memory at address 0x7fffbe20b800>)

    at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:121

#8  0x00006500c16fe2be in buf_read_page_low (space=<error reading variable: Cannot access memory at address 0x7fffbe20b818>, sync=sync@entry=true, mode=mode@entry=132, page_id=..., zip_size=0,

    unzip=unzip@entry=true) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:296

#9  0x00006500c16fed8c in buf_read_page (page_id=...) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:466

#10 0x00006500c0debd54 in buf_page_get_low (page_id=..., zip_size=0, rw_latch=RW_S_LATCH, guess=0x0, mode=10, mtr=0x7fffbe20bcc0, err=0x7fffbe20bb7c, allow_ibuf_merge=false)

    at /home/marko/11.2/storage/innobase/buf/buf0buf.cc:3100

#11 0x00006500c16e5561 in buf_page_get_gen (page_id=..., zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>,

    allow_ibuf_merge=<optimized out>) at /home/marko/11.2/storage/innobase/buf/buf0buf.cc:3359

#12 0x00006500c16930ea in trx_rseg_mem_restore (mtr=0x7fffbe20bcc0, rseg=0x6500c2579380 <trx_sys+48192>) at /home/marko/11.2/storage/innobase/trx/trx0rseg.cc:466

#13 trx_rseg_array_init () at /home/marko/11.2/storage/innobase/trx/trx0rseg.cc:668

#14 0x00006500c169bdc4 in trx_lists_init_at_db_start () at /home/marko/11.2/storage/innobase/trx/trx0trx.cc:722

The buf_flush_page_cleaner() thread was in pthread_cond_timedwait(). Unfortunately, the server process was killed by the test environment while I was trying to debug it.

This hang only occurred in one out of about 100 runs.

Marko Mäkelä added a comment - 2024-10-08 10:18 Last Friday, I was able to reproduce a hang by running the test innodb.recovery_memory very many times. This failure is related to buffer block allocation during early server startup, and the fix may be unrelated to the originally reported failure, but I thought that it’s better documented somewhere. The interesting thread is this one. #5 0x00006500c181ee3d in safe_cond_timedwait (cond=0x6500c257e210 <buf_pool+17616>, mp=0x6500c2579d40 <buf_pool>, abstime=0x7fffbe20b170, file=0x6500c1bc1cd8 <error: Cannot access memory at address 0x6500c1bc1cd8>, line=460) at /home/marko/11.2/mysys/thr_mutex.c:543 #6 0x00006500c16fdca5 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /home/marko/11.2/storage/innobase/buf/buf0lru.cc:460 #7 0x00006500c0dece67 in buf_page_init_for_read (mode=mode@entry=132, page_id=..., page_id@entry=..., zip_size=zip_size@entry=0, unzip=<error reading variable: Cannot access memory at address 0x7fffbe20b800>) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:121 #8 0x00006500c16fe2be in buf_read_page_low (space=<error reading variable: Cannot access memory at address 0x7fffbe20b818>, sync=sync@entry=true, mode=mode@entry=132, page_id=..., zip_size=0, unzip=unzip@entry=true) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:296 #9 0x00006500c16fed8c in buf_read_page (page_id=...) at /home/marko/11.2/storage/innobase/buf/buf0rea.cc:466 #10 0x00006500c0debd54 in buf_page_get_low (page_id=..., zip_size=0, rw_latch=RW_S_LATCH, guess=0x0, mode=10, mtr=0x7fffbe20bcc0, err=0x7fffbe20bb7c, allow_ibuf_merge=false) at /home/marko/11.2/storage/innobase/buf/buf0buf.cc:3100 #11 0x00006500c16e5561 in buf_page_get_gen (page_id=..., zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /home/marko/11.2/storage/innobase/buf/buf0buf.cc:3359 #12 0x00006500c16930ea in trx_rseg_mem_restore (mtr=0x7fffbe20bcc0, rseg=0x6500c2579380 <trx_sys+48192>) at /home/marko/11.2/storage/innobase/trx/trx0rseg.cc:466 #13 trx_rseg_array_init () at /home/marko/11.2/storage/innobase/trx/trx0rseg.cc:668 #14 0x00006500c169bdc4 in trx_lists_init_at_db_start () at /home/marko/11.2/storage/innobase/trx/trx0trx.cc:722 The buf_flush_page_cleaner() thread was in pthread_cond_timedwait() . Unfortunately, the server process was killed by the test environment while I was trying to debug it. This hang only occurred in one out of about 100 runs.

Marko Mäkelä made changes - 2024-10-08 10:18

Resolution	Incomplete [ 4 ]
Status	Closed [ 6 ]	Stalled [ 10000 ]

Marko Mäkelä added a comment - 2024-10-08 10:23 - edited

npr, the wait in buf_dblwr_t::flush_buffered_writes() reminds me of MDEV-33820 and MDEV-34641. Is it possible that the system time could have been adjusted? Based on the presence of iou-wrk threads, the io_uring interface is being used. There is a suspicion is that this subsystem is not immune to the time moving backwards. The Linux kernel version might be relevant.

Marko Mäkelä added a comment - 2024-10-08 10:23 - edited npr , the wait in buf_dblwr_t::flush_buffered_writes() reminds me of MDEV-33820 and MDEV-34641 . Is it possible that the system time could have been adjusted? Based on the presence of iou-wrk threads, the io_uring interface is being used. There is a suspicion is that this subsystem is not immune to the time moving backwards. The Linux kernel version might be relevant.

Marko Mäkelä made changes - 2024-10-08 10:24

Component/s		Storage Engine - InnoDB [ 10129 ]
Component/s	OTHER [ 10125 ]
Fix Version/s		10.6 [ 24028 ]
Fix Version/s		10.11 [ 27614 ]
Fix Version/s	N/A [ 14700 ]
Labels	innodb	hang

npr added a comment - 2024-10-10 10:23

I checked, and while the server is running the standard debian time sync, it should not be exhibiting this problem as vmware-tools timesync is disabled, and as far as I can tell the poll interval is rather large, so multiple time adjustments within the time it takes to run a query should be impossible;

> vmware-toolbox-cmd timesync status

< Disabled

> timedatectl status | grep "Poll interval"

< Poll interval: 34min 8s (min: 32s; max 34min 8s)

> uname -a

< Linux s05 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux

npr added a comment - 2024-10-10 10:23 I checked, and while the server is running the standard debian time sync, it should not be exhibiting this problem as vmware-tools timesync is disabled, and as far as I can tell the poll interval is rather large, so multiple time adjustments within the time it takes to run a query should be impossible; > vmware-toolbox-cmd timesync status < Disabled > timedatectl status | grep "Poll interval" < Poll interval: 34min 8s (min: 32s; max 34min 8s) > uname -a < Linux s05 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux

npr added a comment - 2024-10-14 08:41

Looking into things a bit further, the key thread you seem to identify is hanging on a condition value that never becomes true.

I've read that if a pthread condition is 'deleted' then undefined behaviour could result. The debugger reports the following info when dumping this variable;

(gdb) print *cond

$5 = {__data = {__wseq = {__value64 = 8709, __value32 = {__low = 8709, __high = 0}}, __g1_start = {__value64 = 8705, __value32 = {__low = 8705, __high = 0}}, __g_refs = {0, 2},

    __g_size = {0, 0}, __g1_orig_size = 4, __wrefs = 8, __g_signals = {0, 0}},

  __size = "\005\"\000\000\000\000\000\000\001\"\000\000\000\000\000\000\000\000\000\000\002", '\000' <repeats 11 times>, "\004\000\000\000\b\000\000\000\000\000\000\000\000\000\000",

  __align = 8709}

npr added a comment - 2024-10-14 08:41 Looking into things a bit further, the key thread you seem to identify is hanging on a condition value that never becomes true. I've read that if a pthread condition is 'deleted' then undefined behaviour could result. The debugger reports the following info when dumping this variable; (gdb) print *cond $5 = {__data = {__wseq = {__value64 = 8709, __value32 = {__low = 8709, __high = 0}}, __g1_start = {__value64 = 8705, __value32 = {__low = 8705, __high = 0}}, __g_refs = {0, 2}, __g_size = {0, 0}, __g1_orig_size = 4, __wrefs = 8, __g_signals = {0, 0}}, __size = "\005\"\000\000\000\000\000\000\001\"\000\000\000\000\000\000\000\000\000\000\002", '\000' <repeats 11 times>, "\004\000\000\000\b\000\000\000\000\000\000\000\000\000\000", __align = 8709}

Marko Mäkelä added a comment - 2024-10-14 11:36 - edited

npr, the condition variable buf_dblwr.cond is allocated statically. The issue could be that a pending write request is lost. In that case, write_slots->m_cache.m_pos should not be 0. What is it in your case?

Back in the Linux kernel 5.11 or thereabouts, we ran into a problem that io_uring would sometimes lose write requests. I wouldn’t expect that the 6.1 series would be affected by that.

Marko Mäkelä added a comment - 2024-10-14 11:36 - edited npr , the condition variable buf_dblwr.cond is allocated statically. The issue could be that a pending write request is lost. In that case, write_slots->m_cache.m_pos should not be 0. What is it in your case? Back in the Linux kernel 5.11 or thereabouts, we ran into a problem that io_uring would sometimes lose write requests. I wouldn’t expect that the 6.1 series would be affected by that.

npr added a comment - 2024-10-14 12:28 - edited

(gdb) print write_slots->m_cache.m_pos

$6 = 125

(gdb) thread 5

[Switching to thread 5 (Thread 0x7f4cda75a6c0 (LWP 1294))]

#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57

57      ./nptl/futex-internal.c: No such file or directory.

(gdb) print write_slots->m_cache.m_pos

$7 = 125

Looks like it is indeed nonzero.

For completeness, this machine is using a bog-standard, well tested filesystem; ext4 with 4kB blocks. There's no drive encryption, though there is VMware virturalization:

root@s05:~# blkid

/dev/sda5: UUID="a28215bb-dc64-468e-9f04-b4c96f1ea552" TYPE="swap" PARTUUID="26c5a724-05"

/dev/sda1: UUID="191e1da6-0127-4b30-b6c5-38e2a874e64d" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="26c5a724-01"

npr added a comment - 2024-10-14 12:28 - edited (gdb) print write_slots->m_cache.m_pos $6 = 125 (gdb) thread 5 [Switching to thread 5 (Thread 0x7f4cda75a6c0 (LWP 1294))] #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x564cfe1b1f0c <buf_dblwr+108>) at ./nptl/futex-internal.c:57 57 ./nptl/futex-internal.c: No such file or directory. (gdb) print write_slots->m_cache.m_pos $7 = 125 Looks like it is indeed nonzero. For completeness, this machine is using a bog-standard, well tested filesystem; ext4 with 4kB blocks. There's no drive encryption, though there is VMware virturalization: root@s05:~ # blkid /dev/sda5 : UUID= "a28215bb-dc64-468e-9f04-b4c96f1ea552" TYPE= "swap" PARTUUID= "26c5a724-05" /dev/sda1 : UUID= "191e1da6-0127-4b30-b6c5-38e2a874e64d" BLOCK_SIZE= "4096" TYPE= "ext4" PARTUUID= "26c5a724-01"

Marko Mäkelä added a comment - 2024-10-14 13:26

npr, if you are using MariaDB Server 10.11.6, I think that this report should be a duplicate of ~~MDEV-33669~~ or ~~MDEV-32861~~, which were fixed in 10.11.8 and 10.11.7. I would recommend upgrading to 10.11.9 while you are at it.

Marko Mäkelä added a comment - 2024-10-14 13:26 npr , if you are using MariaDB Server 10.11.6, I think that this report should be a duplicate of MDEV-33669 or MDEV-32861 , which were fixed in 10.11.8 and 10.11.7. I would recommend upgrading to 10.11.9 while you are at it.

Marko Mäkelä made changes - 2024-10-15 08:09

Status

Stalled [ 10000 ]

Needs Feedback [ 10501 ]

Marko Mäkelä added a comment - 2024-10-15 08:09

Is this hang reproducible with 10.11.8 or 10.11.9 or later?

Marko Mäkelä added a comment - 2024-10-15 08:09 Is this hang reproducible with 10.11.8 or 10.11.9 or later?

Sergei Golubchik made changes - 2024-11-18 10:47

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.6 [ 24028 ]
Fix Version/s	10.11 [ 27614 ]
Resolution		Incomplete [ 4 ]
Status	Needs Feedback [ 10501 ]	Closed [ 6 ]

MariaDB Server

Rare Futex deadlocks

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration