[MDEV-32371] Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() on InnoDB ROW_FORMAT=COMPRESSED table corruption - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.11.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Storage Engine - InnoDB
Labels:
- hang
Environment:
Debian 11 Bullseye; MariaDB from the Maria repositories.

Description

I've had two different servers this week hit an issue with a semaphore wait lasting forever. This bug report isn't about what caused those wait-forever situations: it's about how the server handled them.

If I'm reading the code in storage/innobase/srv/srv0srv.cc correctly, there is supposed to be a warning at 1/4 of the innodb_fatal_semaphore_wait_threshold value, at 1/2 of that value, and at 3/4 of that value before killing the server once the threshold value is reached.

Examining the syslog, however, shows this doesn't seem to be working. One of my servers gave no warnings at all. It waited the full 600 seconds and then gave the fatal error as it crashed itself. The "Long wait" message never appeared at all.

The other server did give the "Long wait" warning, starting at 150 seconds, which is the correct time to start (1/4 of 600). I would expect a warning at 300 seconds, then 450 seconds. However it instead warned again at 159 seconds, and then every 10 seconds after that until 289 at which time I killed the server myself rather than waiting the full 600.

I would like to change my innodb_fatal_semaphore_wait_threshold setting to a dramatically lower number. I'll be able to tell whether it's safe to do so by making incremental changes and observing the presence or absence of "Long wait" warnings. However, if these warnings aren't behaving the way they're supposed to, that won't work.

Apologies if I'm misunderstanding what the behavior is supposed to be. It doesn't seem like zero warnings in some cases and warnings every 10 seconds in other cases would be the intended behavior.

Attachments

Issue Links

is duplicated by

MDEV-32909 Hang not caught by semaphore wait threshold

Closed

relates to

MDEV-24911 Missing warning before [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.mutex

Open

Activity

Ascending order - Click to sort in descending order

Xan Charbonnet created issue - 2023-10-07 02:21

Marko Mäkelä added a comment - 2023-10-07 07:41

The "long semaphore wait" diagnostics and sync_array (~~MDEV-11972~~) used to cover all InnoDB rw-locks and mutexes until MariaDB Server 10.4. In 10.5, some mutexes were converted to normal ones. In 10.6, ~~MDEV-21452~~, ~~MDEV-24167~~, and ~~MDEV-24142~~ finally got rid of the sync_array. I thought that most waits involve the data dictionary latch at some point, and therefore I implemented a watchdog on dict_sys.latch only. It looks like this watchdog finally kicked in also in your case, but with more delay.

Can you please provide stack traces of all threads during the hangs? That is the only reliable way to identify and to fix hangs. A known hang that affects the currently latest releases is ~~MDEV-32049~~.

Marko Mäkelä added a comment - 2023-10-07 07:41 The "long semaphore wait" diagnostics and sync_array ( MDEV-11972 ) used to cover all InnoDB rw-locks and mutexes until MariaDB Server 10.4. In 10.5, some mutexes were converted to normal ones. In 10.6, MDEV-21452 , MDEV-24167 , and MDEV-24142 finally got rid of the sync_array . I thought that most waits involve the data dictionary latch at some point, and therefore I implemented a watchdog on dict_sys.latch only. It looks like this watchdog finally kicked in also in your case, but with more delay. Can you please provide stack traces of all threads during the hangs? That is the only reliable way to identify and to fix hangs. A known hang that affects the currently latest releases is MDEV-32049 .

Marko Mäkelä made changes - 2023-10-07 07:41

Field	Original Value	New Value
Status	Open [ 1 ]	Needs Feedback [ 10501 ]

Xan Charbonnet added a comment - 2023-10-11 03:37

Marko,

I'm not sure I understand your explanation of what ended up causing my weird pattern of "long wait" warnings, but I certainly appreciate your help!

I'm afraid the only stack trace I have is the one where no warnings were given. For the other one, the one where warnings were appearing every 10 seconds, I killed the server manually rather than letting it time out and die. I've shrunk the timeout from 600 seconds to 152 seconds, so maybe something will be happening more frequently going forward.

Here's what went into the syslog:

2023-10-04 1:31:50 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-my
sqld/
231004 1:31:50 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.11.5-MariaDB-1:10.11.5+maria~deb11-log source revision: 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=188
max_threads=505
thread_count=197
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1243339 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
2023-10-04 1:31:51 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.68234S), skipping check
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55bb081c7eee]
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x55bb07ca5365]
sigaction.c:0(__restore_rt)[0x7f553d5f1140]
2023-10-04 1:31:58 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.59027S), skipping check
linux/raise.c:51(__GI_raise)[0x7f553cd53ce1]
stdlib/abort.c:81(__GI_abort)[0x7f553cd3d537]
2023-10-04 1:32:02 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.56904S), skipping check
/usr/sbin/mariadbd(+0x69d8c7)[0x55bb078d38c7]
/usr/sbin/mariadbd(+0x694151)[0x55bb078ca151]
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x38)[0x55bb0815fc18]
/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x55bb081609c2]
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x55bb0815e73f]
2023-10-04 1:32:05 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.51115S), skipping check
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f553d101ed0]
2023-10-04 1:32:09 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.8615S), skipping check
nptl/pthread_create.c:478(start_thread)[0x7f553d5e5ea7]
2023-10-04 1:32:12 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.52984S), skipping check
x86_64/clone.S:97(_GI__clone)[0x7f553ce16a2f]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 514437 514437 processes
Max open files 1000000 1000000 files
Max locked memory 2097152 2097152 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 514437 514437 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: core
Kernel version: Linux version 6.1.0-0.deb11.11-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4~bpo11+1 (2023-08-08)

Xan Charbonnet added a comment - 2023-10-11 03:37 Marko, I'm not sure I understand your explanation of what ended up causing my weird pattern of "long wait" warnings, but I certainly appreciate your help! I'm afraid the only stack trace I have is the one where no warnings were given. For the other one, the one where warnings were appearing every 10 seconds, I killed the server manually rather than letting it time out and die. I've shrunk the timeout from 600 seconds to 152 seconds, so maybe something will be happening more frequently going forward. Here's what went into the syslog: 2023-10-04 1:31:50 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-my sqld/ 231004 1:31:50 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.11.5-MariaDB-1:10.11.5+maria~deb11-log source revision: 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=188 max_threads=505 thread_count=197 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1243339 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 2023-10-04 1:31:51 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.68234S), skipping check /usr/sbin/mariadbd(my_print_stacktrace+0x2e) [0x55bb081c7eee] /usr/sbin/mariadbd(handle_fatal_signal+0x485) [0x55bb07ca5365] sigaction.c:0(__restore_rt) [0x7f553d5f1140] 2023-10-04 1:31:58 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.59027S), skipping check linux/raise.c:51(__GI_raise) [0x7f553cd53ce1] stdlib/abort.c:81(__GI_abort) [0x7f553cd3d537] 2023-10-04 1:32:02 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.56904S), skipping check /usr/sbin/mariadbd(+0x69d8c7) [0x55bb078d38c7] /usr/sbin/mariadbd(+0x694151) [0x55bb078ca151] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x38) [0x55bb0815fc18] /usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32) [0x55bb081609c2] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f) [0x55bb0815e73f] 2023-10-04 1:32:05 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.51115S), skipping check /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0) [0x7f553d101ed0] 2023-10-04 1:32:09 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.8615S), skipping check nptl/pthread_create.c:478(start_thread) [0x7f553d5e5ea7] 2023-10-04 1:32:12 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.52984S), skipping check x86_64/clone.S:97(_ GI __clone) [0x7f553ce16a2f] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /var/lib/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 514437 514437 processes Max open files 1000000 1000000 files Max locked memory 2097152 2097152 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 514437 514437 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core Kernel version: Linux version 6.1.0-0.deb11.11-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4~bpo11+1 (2023-08-08)

Marko Mäkelä added a comment - 2023-10-11 06:56

xan@biblionix.com, the stack trace that was written to the log is basically useless, because it apparently is from the thread that intentionally kills the server after noticing a hang. (The InnoDB function names in the stack trace were not even resolved by the built-in stack trace generator; this is a known problem.)

To analyze the root cause of any hang, we really need the stack traces of all threads of the process.

In https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/#enabling-core-dumps you should be able to find how to enable core dumps and how to extract stack traces from a core dump.

Marko Mäkelä added a comment - 2023-10-11 06:56 xan@biblionix.com , the stack trace that was written to the log is basically useless, because it apparently is from the thread that intentionally kills the server after noticing a hang. (The InnoDB function names in the stack trace were not even resolved by the built-in stack trace generator; this is a known problem.) To analyze the root cause of any hang, we really need the stack traces of all threads of the process. In https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/#enabling-core-dumps you should be able to find how to enable core dumps and how to extract stack traces from a core dump.

Xan Charbonnet added a comment - 2023-10-18 02:52

Marko,

I enabled core dumps and now have a stack trace for two crashes which have occurred over the past few days. There is definitely a lot more information in these stack traces.

In fact there's so much information that I'm not sure I can guarantee it's appropriate for public display. Is there a more private way for me to get these stack traces to you?

Xan Charbonnet added a comment - 2023-10-18 02:52 Marko, I enabled core dumps and now have a stack trace for two crashes which have occurred over the past few days. There is definitely a lot more information in these stack traces. In fact there's so much information that I'm not sure I can guarantee it's appropriate for public display. Is there a more private way for me to get these stack traces to you?

Marko Mäkelä added a comment - 2023-10-18 08:28

xan@biblionix.com, that is great news. You can send the stack traces to me by email. I just sent an email to you.

Marko Mäkelä added a comment - 2023-10-18 08:28 xan@biblionix.com , that is great news. You can send the stack traces to me by email. I just sent an email to you.

Marko Mäkelä added a comment - 2023-10-19 12:23

Thank you, I got the stack traces of two hung processes. I will need to analyze them some time later, once I am done with some urgent tasks.

This looks like a new type of hang. I see many threads invoking buf_pool_t::corrupted_evict(), trying to remove a corrupted page from the buffer pool. We did test this logic with some debug instrumentation during the development of ~~MDEV-13542~~, but such a test might not be part of our regular stress test battery.

Marko Mäkelä added a comment - 2023-10-19 12:23 Thank you, I got the stack traces of two hung processes. I will need to analyze them some time later, once I am done with some urgent tasks. This looks like a new type of hang. I see many threads invoking buf_pool_t::corrupted_evict() , trying to remove a corrupted page from the buffer pool. We did test this logic with some debug instrumentation during the development of MDEV-13542 , but such a test might not be part of our regular stress test battery.

Marko Mäkelä made changes - 2023-10-19 12:23

Assignee		Marko Mäkelä [ marko ]
Status	Needs Feedback [ 10501 ]	Open [ 1 ]

Marko Mäkelä added a comment - 2023-11-02 14:31

In one of the provided sets of stack traces, I found that a thread is waiting in buf_page_get_zip() here:

      if (discard_attempted || !bpage->frame)

        /* Even when we are holding a hash_lock, it should be

        acceptable to wait for a page S-latch here, because

        buf_page_t::read_complete() will not wait for buf_pool.mutex,

        and because S-latch would not conflict with a U-latch

        that would be protecting buf_page_t::write_complete(). */

        bpage->lock.s_lock();

        hash_lock.unlock_shared();

        break;

The comment may have been truthful before ~~MDEV-13542~~. Here, we are holding the buf_pool.page_hash latch for page 0x19b1a:0x17880 while several other threads are actually waiting for buf_pool.mutex in the read completion callback so that they could throw out corrupted pages. Possibly, one of the pages is exactly this page that is being requested here.

There is also one thread apparently in a busy loop, looking for a page of a ROW_FORMAT=COMPRESSED table:

mariadb-10.11.5

#3  buf_page_get_low (page_id={m_id = 105604655874051}, zip_size=8192, rw_latch=1, guess=<optimized out>, mode=10, mtr=0x7f44cddd9e90, err=0x7f44cddd93f0, allow_ibuf_merge=false) at ./storage/innobase/buf/buf0buf.cc:2611

I checked another set of stack traces and found the same pattern. In the most recent one trace_2023-10-31_21-13-25 we have Thread 76 waiting for buf_pool.mutex:

mariadb-10.11.5
Thread 76 (Thread 0x7fc87dae5700 (LWP 3435174)):
…
#2 0x00005638ead3f85d in inline_mysql_mutex_lock (src_file=0x5638eb8d8228 "./storage/innobase/buf/buf0lru.cc", src_line=1231, that=0x5638ebf68b00 <buf_pool>) at ./include/mysql/psi/mysql_thread.h:752
No locals.
#3 buf_pool_t::corrupted_evict (this=0x5638ebf68b00 <buf_pool>, bpage=0x5639908a14e8, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1231
id = {m_id = <optimized out>}
chain = @0x5638f1ceca18: {first = 0x56396e8cc258}
hash_lock = @0x5638f1ceca00: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, static WRITER = 2147483648}}
unfix = <optimized out>
s = <optimized out>
#4 0x00005638eb5271d5 in buf_page_t::read_complete (this=0x5639908a14e8, node=@0x5638f0b3e1d8: {space = 0x5638f0b3d8d8, name = 0x5638f0b1f4b8 "./atoz/c_log#P#p13.ibd", handle = {m_file = 262, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 198656, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3627
database_corrupted = <optimized out>
expected_id = {m_id = 452504869516550}
read_frame = <optimized out>
err = DB_FAIL
recovery = <optimized out>
ibuf_may_exist = <optimized out>

The page ID (452504869516550 == 0x19b8d0001b506) is almost identical with what we have in another thread, but it does not seem to share the buf_pool.page_hash latch with it (0x5638f1ceca00 is not 0x5638f1cec900):

mariadb-10.11.5
Thread 58 (Thread 0x7fc87f573700 (LWP 3738762)):
…
#6 sux_lock<ssux_lock_impl<true> >::s_lock (this=0x563973caed20) at ./storage/innobase/include/sux_lock.h:362
No locals.
#7 buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2233
discard_attempted = <optimized out>
chain = @0x5638f1cec928: {first = 0x563973caed08}
hash_lock = @0x5638f1cec900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}
bpage = 0x563973caed08

By searching for 0x5638f1cec900 I found a thread that seems to be a culprit in this deadlock:

mariadb-10.11.5
Thread 139 (Thread 0x7fc8762d6700 (LWP 3435180)):
…
#4 page_hash_latch::lock (this=0x5638f1cec900) at ./storage/innobase/include/buf0types.h:214
No locals.
#5 buf_pool_t::corrupted_evict (this=0x5638ebf68b00 <buf_pool>, bpage=0x563969c1c648, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1232
id = {m_id = <optimized out>}
chain = @0x5638f1cec920: {first = 0x563969c1c648}
hash_lock = @0x5638f1cec900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}
unfix = <optimized out>
s = <optimized out>
#6 0x00005638eb5271d5 in buf_page_t::read_complete (this=0x563969c1c648, node=@0x5638f0b3e1d8: {space = 0x5638f0b3d8d8, name = 0x5638f0b1f4b8 "./atoz/c_log#P#p13.ibd", handle = {m_file = 262, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 198656, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3627
database_corrupted = <optimized out>
expected_id = {m_id = 452504869516545}
read_frame = <optimized out>
err = DB_FAIL
recovery = <optimized out>
ibuf_may_exist = <optimized out>

Bingo, this thread is actually holding buf_pool.mutex (blocking a lot of threads), and waiting for the page hash latch that Thread 58 is holding while holding the page latch that Thread 58 is waiting for:

  mysql_mutex_lock(&mutex);

  hash_lock.lock();

  ut_ad(!bpage->oldest_modification());

  bpage->set_corrupt_id();

  auto unfix= state - buf_page_t::FREED;

  auto s= bpage->zip.fix.fetch_sub(unfix) - unfix;

  bpage->lock.x_unlock(true);

The function buf_page_get_zip() needs to be revised so that it will not wait for the page latch while holding the buf_pool.page_hash latch. This function is fetching BLOB pages of ROW_FORMAT=COMPRESSED tables.

Marko Mäkelä added a comment - 2023-11-02 14:31 In one of the provided sets of stack traces, I found that a thread is waiting in buf_page_get_zip() here: if (discard_attempted || !bpage->frame) { /* Even when we are holding a hash_lock, it should be acceptable to wait for a page S-latch here, because buf_page_t::read_complete() will not wait for buf_pool.mutex, and because S-latch would not conflict with a U-latch that would be protecting buf_page_t::write_complete(). */ bpage->lock.s_lock(); hash_lock.unlock_shared(); break ; } The comment may have been truthful before MDEV-13542 . Here, we are holding the buf_pool.page_hash latch for page 0x19b1a:0x17880 while several other threads are actually waiting for buf_pool.mutex in the read completion callback so that they could throw out corrupted pages. Possibly, one of the pages is exactly this page that is being requested here. There is also one thread apparently in a busy loop, looking for a page of a ROW_FORMAT=COMPRESSED table: mariadb-10.11.5 #3 buf_page_get_low (page_id={m_id = 105604655874051}, zip_size=8192, rw_latch=1, guess=<optimized out>, mode=10, mtr=0x7f44cddd9e90, err=0x7f44cddd93f0, allow_ibuf_merge=false) at ./storage/innobase/buf/buf0buf.cc:2611 I checked another set of stack traces and found the same pattern. In the most recent one trace_2023-10-31_21-13-25 we have Thread 76 waiting for buf_pool.mutex : mariadb-10.11.5 Thread 76 (Thread 0x7fc87dae5700 (LWP 3435174)): … #2 0x00005638ead3f85d in inline_mysql_mutex_lock (src_file=0x5638eb8d8228 "./storage/innobase/buf/buf0lru.cc", src_line=1231, that=0x5638ebf68b00 <buf_pool>) at ./include/mysql/psi/mysql_thread.h:752 No locals. #3 buf_pool_t::corrupted_evict (this=0x5638ebf68b00 <buf_pool>, bpage=0x5639908a14e8, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1231 id = {m_id = <optimized out>} chain = @0x5638f1ceca18: {first = 0x56396e8cc258} hash_lock = @0x5638f1ceca00: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, static WRITER = 2147483648}} unfix = <optimized out> s = <optimized out> #4 0x00005638eb5271d5 in buf_page_t::read_complete (this=0x5639908a14e8, node=@0x5638f0b3e1d8: {space = 0x5638f0b3d8d8, name = 0x5638f0b1f4b8 "./atoz/c_log#P#p13.ibd", handle = {m_file = 262, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 198656, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3627 database_corrupted = <optimized out> expected_id = {m_id = 452504869516550} read_frame = <optimized out> err = DB_FAIL recovery = <optimized out> ibuf_may_exist = <optimized out> The page ID (452504869516550 == 0x19b8d0001b506) is almost identical with what we have in another thread, but it does not seem to share the buf_pool.page_hash latch with it (0x5638f1ceca00 is not 0x5638f1cec900): mariadb-10.11.5 Thread 58 (Thread 0x7fc87f573700 (LWP 3738762)): … #6 sux_lock<ssux_lock_impl<true> >::s_lock (this=0x563973caed20) at ./storage/innobase/include/sux_lock.h:362 No locals. #7 buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2233 discard_attempted = <optimized out> chain = @0x5638f1cec928: {first = 0x563973caed08} hash_lock = @0x5638f1cec900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}} bpage = 0x563973caed08 By searching for 0x5638f1cec900 I found a thread that seems to be a culprit in this deadlock: mariadb-10.11.5 Thread 139 (Thread 0x7fc8762d6700 (LWP 3435180)): … #4 page_hash_latch::lock (this=0x5638f1cec900) at ./storage/innobase/include/buf0types.h:214 No locals. #5 buf_pool_t::corrupted_evict (this=0x5638ebf68b00 <buf_pool>, bpage=0x563969c1c648, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1232 id = {m_id = <optimized out>} chain = @0x5638f1cec920: {first = 0x563969c1c648} hash_lock = @0x5638f1cec900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}} unfix = <optimized out> s = <optimized out> #6 0x00005638eb5271d5 in buf_page_t::read_complete (this=0x563969c1c648, node=@0x5638f0b3e1d8: {space = 0x5638f0b3d8d8, name = 0x5638f0b1f4b8 "./atoz/c_log#P#p13.ibd", handle = {m_file = 262, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 198656, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3627 database_corrupted = <optimized out> expected_id = {m_id = 452504869516545} read_frame = <optimized out> err = DB_FAIL recovery = <optimized out> ibuf_may_exist = <optimized out> Bingo, this thread is actually holding buf_pool.mutex (blocking a lot of threads), and waiting for the page hash latch that Thread 58 is holding while holding the page latch that Thread 58 is waiting for: mysql_mutex_lock(&mutex); hash_lock.lock(); ut_ad(!bpage->oldest_modification()); bpage->set_corrupt_id(); auto unfix= state - buf_page_t::FREED; auto s= bpage->zip.fix.fetch_sub(unfix) - unfix; bpage->lock.x_unlock( true ); The function buf_page_get_zip() needs to be revised so that it will not wait for the page latch while holding the buf_pool.page_hash latch. This function is fetching BLOB pages of ROW_FORMAT=COMPRESSED tables.

Marko Mäkelä made changes - 2023-11-02 14:31

Summary

innodb_fatal_semaphore_wait_threshold warnings not happening

Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() on InnoDB ROW_FORMAT=COMPRESSED table corruption

Marko Mäkelä made changes - 2023-11-02 14:31

Fix Version/s		10.6 [ 24028 ]
Fix Version/s		10.11 [ 27614 ]
Fix Version/s		11.0 [ 28320 ]
Fix Version/s		11.1 [ 28549 ]
Fix Version/s		11.2 [ 28603 ]
Fix Version/s		11.3 [ 28565 ]
Affects Version/s		10.6 [ 24028 ]
Affects Version/s		10.7 [ 24805 ]
Affects Version/s		10.8 [ 26121 ]
Affects Version/s		10.9 [ 26905 ]
Affects Version/s		10.10 [ 27530 ]
Affects Version/s		10.11 [ 27614 ]
Affects Version/s		11.0 [ 28320 ]
Affects Version/s		11.1 [ 28549 ]
Affects Version/s		11.2 [ 28603 ]
Affects Version/s		11.3 [ 28565 ]
Labels		hang
Priority	Major [ 3 ]	Critical [ 2 ]

Marko Mäkelä made changes - 2023-11-02 14:32

Status

Open [ 1 ]

Confirmed [ 10101 ]

Marko Mäkelä made changes - 2023-11-03 06:17

Link

This issue relates to MDEV-24911 [ MDEV-24911 ]

Marko Mäkelä made changes - 2023-11-23 10:21

Assignee	Marko Mäkelä [ marko ]	Thirunarayanan Balathandayuthapani [ thiru ]
Status	Confirmed [ 10101 ]	In Review [ 10002 ]

Matthias Leich added a comment - 2023-11-24 10:32

origin/10.6-~~MDEV-32371~~ 963a88be2e89100260e09e46e90e2b3271444899 2023-11-23T12:11:39+02:00
performed well in RQG testing.

Matthias Leich added a comment - 2023-11-24 10:32 origin/10.6- MDEV-32371 963a88be2e89100260e09e46e90e2b3271444899 2023-11-23T12:11:39+02:00 performed well in RQG testing.

Thirunarayanan Balathandayuthapani made changes - 2023-11-28 08:37

Assignee	Thirunarayanan Balathandayuthapani [ thiru ]	Marko Mäkelä [ marko ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Marko Mäkelä made changes - 2023-11-30 07:18

Link

This issue is duplicated by ~~MDEV-32909~~ [ ~~MDEV-32909~~ ]

Marko Mäkelä made changes - 2023-11-30 10:01

issue.field.resolutiondate

2023-11-30 10:01:43.0

2023-11-30 10:01:43.1

Marko Mäkelä made changes - 2023-11-30 10:01

Fix Version/s		10.6.17 [ 29518 ]
Fix Version/s		10.11.7 [ 29519 ]
Fix Version/s		11.0.5 [ 29520 ]
Fix Version/s		11.1.4 [ 29024 ]
Fix Version/s		11.2.3 [ 29521 ]
Fix Version/s		11.3.2 [ 29522 ]
Fix Version/s	10.6 [ 24028 ]
Fix Version/s	10.11 [ 27614 ]
Fix Version/s	11.0 [ 28320 ]
Fix Version/s	11.1 [ 28549 ]
Fix Version/s	11.3 [ 28565 ]
Fix Version/s	11.2 [ 28603 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Marko Mäkelä added a comment - 2023-11-30 13:29

xan@biblionix.com, in case you are interested, at https://ci.mariadb.org/41154/ you can find a packages of the current 10.11 development branch, which includes a fix of this bug.

Marko Mäkelä added a comment - 2023-11-30 13:29 xan@biblionix.com , in case you are interested, at https://ci.mariadb.org/41154/ you can find a packages of the current 10.11 development branch, which includes a fix of this bug.

Xan Charbonnet added a comment - 2023-11-30 15:21

Marko, thank you very much!

Xan Charbonnet added a comment - 2023-11-30 15:21 Marko, thank you very much!

Xan Charbonnet added a comment - 2023-12-11 17:09

@marko, I ended up reccompiling the Debian package after making your modification. Have not seen a hang since. Thanks!

Xan Charbonnet added a comment - 2023-12-11 17:09 @marko, I ended up reccompiling the Debian package after making your modification. Have not seen a hang since. Thanks!

Laurence Parry added a comment - 2024-03-11 04:04

I think we got caught with this on Debian bookworm during backup of databases with compressed pages - thanks for the fix. Put on the version from testing to get it.

Laurence Parry added a comment - 2024-03-11 04:04 I think we got caught with this on Debian bookworm during backup of databases with compressed pages - thanks for the fix. Put on the version from testing to get it.

People

Assignee:: Marko Mäkelä

Reporter:: Xan Charbonnet

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2023-10-07 02:21

Updated:: 2024-03-11 04:04

Resolved:: 2023-11-30 10:01

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.