Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32371

Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() on InnoDB ROW_FORMAT=COMPRESSED table corruption

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.11.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
    • 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
    • Debian 11 Bullseye; MariaDB from the Maria repositories.

    Description

      I've had two different servers this week hit an issue with a semaphore wait lasting forever. This bug report isn't about what caused those wait-forever situations: it's about how the server handled them.

      If I'm reading the code in storage/innobase/srv/srv0srv.cc correctly, there is supposed to be a warning at 1/4 of the innodb_fatal_semaphore_wait_threshold value, at 1/2 of that value, and at 3/4 of that value before killing the server once the threshold value is reached.

      Examining the syslog, however, shows this doesn't seem to be working. One of my servers gave no warnings at all. It waited the full 600 seconds and then gave the fatal error as it crashed itself. The "Long wait" message never appeared at all.

      The other server did give the "Long wait" warning, starting at 150 seconds, which is the correct time to start (1/4 of 600). I would expect a warning at 300 seconds, then 450 seconds. However it instead warned again at 159 seconds, and then every 10 seconds after that until 289 at which time I killed the server myself rather than waiting the full 600.

      I would like to change my innodb_fatal_semaphore_wait_threshold setting to a dramatically lower number. I'll be able to tell whether it's safe to do so by making incremental changes and observing the presence or absence of "Long wait" warnings. However, if these warnings aren't behaving the way they're supposed to, that won't work.

      Apologies if I'm misunderstanding what the behavior is supposed to be. It doesn't seem like zero warnings in some cases and warnings every 10 seconds in other cases would be the intended behavior.

      Attachments

        Issue Links

          Activity

            xan@biblionix.com Xan Charbonnet created issue -

            The "long semaphore wait" diagnostics and sync_array (MDEV-11972) used to cover all InnoDB rw-locks and mutexes until MariaDB Server 10.4. In 10.5, some mutexes were converted to normal ones. In 10.6, MDEV-21452, MDEV-24167, and MDEV-24142 finally got rid of the sync_array. I thought that most waits involve the data dictionary latch at some point, and therefore I implemented a watchdog on dict_sys.latch only. It looks like this watchdog finally kicked in also in your case, but with more delay.

            Can you please provide stack traces of all threads during the hangs? That is the only reliable way to identify and to fix hangs. A known hang that affects the currently latest releases is MDEV-32049.

            marko Marko Mäkelä added a comment - The "long semaphore wait" diagnostics and sync_array ( MDEV-11972 ) used to cover all InnoDB rw-locks and mutexes until MariaDB Server 10.4. In 10.5, some mutexes were converted to normal ones. In 10.6, MDEV-21452 , MDEV-24167 , and MDEV-24142 finally got rid of the sync_array . I thought that most waits involve the data dictionary latch at some point, and therefore I implemented a watchdog on dict_sys.latch only. It looks like this watchdog finally kicked in also in your case, but with more delay. Can you please provide stack traces of all threads during the hangs? That is the only reliable way to identify and to fix hangs. A known hang that affects the currently latest releases is MDEV-32049 .
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            Status Open [ 1 ] Needs Feedback [ 10501 ]

            Marko,

            I'm not sure I understand your explanation of what ended up causing my weird pattern of "long wait" warnings, but I certainly appreciate your help!

            I'm afraid the only stack trace I have is the one where no warnings were given. For the other one, the one where warnings were appearing every 10 seconds, I killed the server manually rather than letting it time out and die. I've shrunk the timeout from 600 seconds to 152 seconds, so maybe something will be happening more frequently going forward.

            Here's what went into the syslog:

            2023-10-04 1:31:50 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-my
            sqld/
            231004 1:31:50 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.
            Server version: 10.11.5-MariaDB-1:10.11.5+maria~deb11-log source revision: 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c
            key_buffer_size=134217728
            read_buffer_size=131072
            max_used_connections=188
            max_threads=505
            thread_count=197
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1243339 K bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.
            Thread pointer: 0x0
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x0 thread_stack 0x49000
            2023-10-04 1:31:51 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.68234S), skipping check
            /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55bb081c7eee]
            /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x55bb07ca5365]
            sigaction.c:0(__restore_rt)[0x7f553d5f1140]
            2023-10-04 1:31:58 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.59027S), skipping check
            linux/raise.c:51(__GI_raise)[0x7f553cd53ce1]
            stdlib/abort.c:81(__GI_abort)[0x7f553cd3d537]
            2023-10-04 1:32:02 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.56904S), skipping check
            /usr/sbin/mariadbd(+0x69d8c7)[0x55bb078d38c7]
            /usr/sbin/mariadbd(+0x694151)[0x55bb078ca151]
            /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x38)[0x55bb0815fc18]
            /usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x55bb081609c2]
            /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x55bb0815e73f]
            2023-10-04 1:32:05 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.51115S), skipping check
            /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f553d101ed0]
            2023-10-04 1:32:09 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.8615S), skipping check
            nptl/pthread_create.c:478(start_thread)[0x7f553d5e5ea7]
            2023-10-04 1:32:12 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.52984S), skipping check
            x86_64/clone.S:97(_GI__clone)[0x7f553ce16a2f]
            The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
            information that should help you find out what is causing the crash.
            Writing a core file...
            Working directory at /var/lib/mysql
            Resource Limits:
            Limit Soft Limit Hard Limit Units
            Max cpu time unlimited unlimited seconds
            Max file size unlimited unlimited bytes
            Max data size unlimited unlimited bytes
            Max stack size 8388608 unlimited bytes
            Max core file size 0 unlimited bytes
            Max resident set unlimited unlimited bytes
            Max processes 514437 514437 processes
            Max open files 1000000 1000000 files
            Max locked memory 2097152 2097152 bytes
            Max address space unlimited unlimited bytes
            Max file locks unlimited unlimited locks
            Max pending signals 514437 514437 signals
            Max msgqueue size 819200 819200 bytes
            Max nice priority 0 0
            Max realtime priority 0 0
            Max realtime timeout unlimited unlimited us
            Core pattern: core
            Kernel version: Linux version 6.1.0-0.deb11.11-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4~bpo11+1 (2023-08-08)

            xan@biblionix.com Xan Charbonnet added a comment - Marko, I'm not sure I understand your explanation of what ended up causing my weird pattern of "long wait" warnings, but I certainly appreciate your help! I'm afraid the only stack trace I have is the one where no warnings were given. For the other one, the one where warnings were appearing every 10 seconds, I killed the server manually rather than letting it time out and die. I've shrunk the timeout from 600 seconds to 152 seconds, so maybe something will be happening more frequently going forward. Here's what went into the syslog: 2023-10-04 1:31:50 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-my sqld/ 231004 1:31:50 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.11.5-MariaDB-1:10.11.5+maria~deb11-log source revision: 7875294b6b74b53dd3aaa723e6cc103d2bb47b2c key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=188 max_threads=505 thread_count=197 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1243339 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 2023-10-04 1:31:51 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.68234S), skipping check /usr/sbin/mariadbd(my_print_stacktrace+0x2e) [0x55bb081c7eee] /usr/sbin/mariadbd(handle_fatal_signal+0x485) [0x55bb07ca5365] sigaction.c:0(__restore_rt) [0x7f553d5f1140] 2023-10-04 1:31:58 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.59027S), skipping check linux/raise.c:51(__GI_raise) [0x7f553cd53ce1] stdlib/abort.c:81(__GI_abort) [0x7f553cd3d537] 2023-10-04 1:32:02 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.56904S), skipping check /usr/sbin/mariadbd(+0x69d8c7) [0x55bb078d38c7] /usr/sbin/mariadbd(+0x694151) [0x55bb078ca151] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x38) [0x55bb0815fc18] /usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32) [0x55bb081609c2] /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f) [0x55bb0815e73f] 2023-10-04 1:32:05 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.51115S), skipping check /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0) [0x7f553d101ed0] 2023-10-04 1:32:09 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.8615S), skipping check nptl/pthread_create.c:478(start_thread) [0x7f553d5e5ea7] 2023-10-04 1:32:12 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.52984S), skipping check x86_64/clone.S:97(_ GI __clone) [0x7f553ce16a2f] The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /var/lib/mysql Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 514437 514437 processes Max open files 1000000 1000000 files Max locked memory 2097152 2097152 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 514437 514437 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core Kernel version: Linux version 6.1.0-0.deb11.11-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4~bpo11+1 (2023-08-08)

            xan@biblionix.com, the stack trace that was written to the log is basically useless, because it apparently is from the thread that intentionally kills the server after noticing a hang. (The InnoDB function names in the stack trace were not even resolved by the built-in stack trace generator; this is a known problem.)

            To analyze the root cause of any hang, we really need the stack traces of all threads of the process.

            In https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/#enabling-core-dumps you should be able to find how to enable core dumps and how to extract stack traces from a core dump.

            marko Marko Mäkelä added a comment - xan@biblionix.com , the stack trace that was written to the log is basically useless, because it apparently is from the thread that intentionally kills the server after noticing a hang. (The InnoDB function names in the stack trace were not even resolved by the built-in stack trace generator; this is a known problem.) To analyze the root cause of any hang, we really need the stack traces of all threads of the process. In https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/#enabling-core-dumps you should be able to find how to enable core dumps and how to extract stack traces from a core dump.

            Marko,

            I enabled core dumps and now have a stack trace for two crashes which have occurred over the past few days. There is definitely a lot more information in these stack traces.

            In fact there's so much information that I'm not sure I can guarantee it's appropriate for public display. Is there a more private way for me to get these stack traces to you?

            xan@biblionix.com Xan Charbonnet added a comment - Marko, I enabled core dumps and now have a stack trace for two crashes which have occurred over the past few days. There is definitely a lot more information in these stack traces. In fact there's so much information that I'm not sure I can guarantee it's appropriate for public display. Is there a more private way for me to get these stack traces to you?

            xan@biblionix.com, that is great news. You can send the stack traces to me by email. I just sent an email to you.

            marko Marko Mäkelä added a comment - xan@biblionix.com , that is great news. You can send the stack traces to me by email. I just sent an email to you.

            Thank you, I got the stack traces of two hung processes. I will need to analyze them some time later, once I am done with some urgent tasks.

            This looks like a new type of hang. I see many threads invoking buf_pool_t::corrupted_evict(), trying to remove a corrupted page from the buffer pool. We did test this logic with some debug instrumentation during the development of MDEV-13542, but such a test might not be part of our regular stress test battery.

            marko Marko Mäkelä added a comment - Thank you, I got the stack traces of two hung processes. I will need to analyze them some time later, once I am done with some urgent tasks. This looks like a new type of hang. I see many threads invoking buf_pool_t::corrupted_evict() , trying to remove a corrupted page from the buffer pool. We did test this logic with some debug instrumentation during the development of MDEV-13542 , but such a test might not be part of our regular stress test battery.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ]
            Status Needs Feedback [ 10501 ] Open [ 1 ]

            In one of the provided sets of stack traces, I found that a thread is waiting in buf_page_get_zip() here:

                  if (discard_attempted || !bpage->frame)
                  {
                    /* Even when we are holding a hash_lock, it should be
                    acceptable to wait for a page S-latch here, because
                    buf_page_t::read_complete() will not wait for buf_pool.mutex,
                    and because S-latch would not conflict with a U-latch
                    that would be protecting buf_page_t::write_complete(). */
                    bpage->lock.s_lock();
                    hash_lock.unlock_shared();
                    break;
                  }
            

            The comment may have been truthful before MDEV-13542. Here, we are holding the buf_pool.page_hash latch for page 0x19b1a:0x17880 while several other threads are actually waiting for buf_pool.mutex in the read completion callback so that they could throw out corrupted pages. Possibly, one of the pages is exactly this page that is being requested here.

            There is also one thread apparently in a busy loop, looking for a page of a ROW_FORMAT=COMPRESSED table:

            mariadb-10.11.5

            #3  buf_page_get_low (page_id={m_id = 105604655874051}, zip_size=8192, rw_latch=1, guess=<optimized out>, mode=10, mtr=0x7f44cddd9e90, err=0x7f44cddd93f0, allow_ibuf_merge=false) at ./storage/innobase/buf/buf0buf.cc:2611
            

            I checked another set of stack traces and found the same pattern. In the most recent one trace_2023-10-31_21-13-25 we have Thread 76 waiting for buf_pool.mutex:

            mariadb-10.11.5

            Thread 76 (Thread 0x7fc87dae5700 (LWP 3435174)):
            #2  0x00005638ead3f85d in inline_mysql_mutex_lock (src_file=0x5638eb8d8228 "./storage/innobase/buf/buf0lru.cc", src_line=1231, that=0x5638ebf68b00 <buf_pool>) at ./include/mysql/psi/mysql_thread.h:752
            No locals.
            #3  buf_pool_t::corrupted_evict (this=0x5638ebf68b00 <buf_pool>, bpage=0x5639908a14e8, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1231
                    id = {m_id = <optimized out>}
                    chain = @0x5638f1ceca18: {first = 0x56396e8cc258}
                    hash_lock = @0x5638f1ceca00: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, static WRITER = 2147483648}}
                    unfix = <optimized out>
                    s = <optimized out>
            #4  0x00005638eb5271d5 in buf_page_t::read_complete (this=0x5639908a14e8, node=@0x5638f0b3e1d8: {space = 0x5638f0b3d8d8, name = 0x5638f0b1f4b8 "./atoz/c_log#P#p13.ibd", handle = {m_file = 262, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 198656, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3627
                    database_corrupted = <optimized out>
                    expected_id = {m_id = 452504869516550}
                    read_frame = <optimized out>
                    err = DB_FAIL
                    recovery = <optimized out>
                    ibuf_may_exist = <optimized out>
            

            The page ID (452504869516550 == 0x19b8d0001b506) is almost identical with what we have in another thread, but it does not seem to share the buf_pool.page_hash latch with it (0x5638f1ceca00 is not 0x5638f1cec900):

            mariadb-10.11.5

            Thread 58 (Thread 0x7fc87f573700 (LWP 3738762)):
            #6  sux_lock<ssux_lock_impl<true> >::s_lock (this=0x563973caed20) at ./storage/innobase/include/sux_lock.h:362
            No locals.
            #7  buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2233
                    discard_attempted = <optimized out>
                    chain = @0x5638f1cec928: {first = 0x563973caed08}
                    hash_lock = @0x5638f1cec900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}
                    bpage = 0x563973caed08
            

            By searching for 0x5638f1cec900 I found a thread that seems to be a culprit in this deadlock:

            mariadb-10.11.5

            Thread 139 (Thread 0x7fc8762d6700 (LWP 3435180)):
            #4  page_hash_latch::lock (this=0x5638f1cec900) at ./storage/innobase/include/buf0types.h:214
            No locals.
            #5  buf_pool_t::corrupted_evict (this=0x5638ebf68b00 <buf_pool>, bpage=0x563969c1c648, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1232
                    id = {m_id = <optimized out>}
                    chain = @0x5638f1cec920: {first = 0x563969c1c648}
                    hash_lock = @0x5638f1cec900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}
                    unfix = <optimized out>
                    s = <optimized out>
            #6  0x00005638eb5271d5 in buf_page_t::read_complete (this=0x563969c1c648, node=@0x5638f0b3e1d8: {space = 0x5638f0b3d8d8, name = 0x5638f0b1f4b8 "./atoz/c_log#P#p13.ibd", handle = {m_file = 262, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 198656, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3627
                    database_corrupted = <optimized out>
                    expected_id = {m_id = 452504869516545}
                    read_frame = <optimized out>
                    err = DB_FAIL
                    recovery = <optimized out>
                    ibuf_may_exist = <optimized out>
            

            Bingo, this thread is actually holding buf_pool.mutex (blocking a lot of threads), and waiting for the page hash latch that Thread 58 is holding while holding the page latch that Thread 58 is waiting for:

              mysql_mutex_lock(&mutex);
              hash_lock.lock();
             
              ut_ad(!bpage->oldest_modification());
              bpage->set_corrupt_id();
              auto unfix= state - buf_page_t::FREED;
              auto s= bpage->zip.fix.fetch_sub(unfix) - unfix;
              bpage->lock.x_unlock(true);
            

            The function buf_page_get_zip() needs to be revised so that it will not wait for the page latch while holding the buf_pool.page_hash latch. This function is fetching BLOB pages of ROW_FORMAT=COMPRESSED tables.

            marko Marko Mäkelä added a comment - In one of the provided sets of stack traces, I found that a thread is waiting in buf_page_get_zip() here: if (discard_attempted || !bpage->frame) { /* Even when we are holding a hash_lock, it should be acceptable to wait for a page S-latch here, because buf_page_t::read_complete() will not wait for buf_pool.mutex, and because S-latch would not conflict with a U-latch that would be protecting buf_page_t::write_complete(). */ bpage->lock.s_lock(); hash_lock.unlock_shared(); break ; } The comment may have been truthful before MDEV-13542 . Here, we are holding the buf_pool.page_hash latch for page 0x19b1a:0x17880 while several other threads are actually waiting for buf_pool.mutex in the read completion callback so that they could throw out corrupted pages. Possibly, one of the pages is exactly this page that is being requested here. There is also one thread apparently in a busy loop, looking for a page of a ROW_FORMAT=COMPRESSED table: mariadb-10.11.5 #3 buf_page_get_low (page_id={m_id = 105604655874051}, zip_size=8192, rw_latch=1, guess=<optimized out>, mode=10, mtr=0x7f44cddd9e90, err=0x7f44cddd93f0, allow_ibuf_merge=false) at ./storage/innobase/buf/buf0buf.cc:2611 I checked another set of stack traces and found the same pattern. In the most recent one trace_2023-10-31_21-13-25 we have Thread 76 waiting for buf_pool.mutex : mariadb-10.11.5 Thread 76 (Thread 0x7fc87dae5700 (LWP 3435174)): … #2 0x00005638ead3f85d in inline_mysql_mutex_lock (src_file=0x5638eb8d8228 "./storage/innobase/buf/buf0lru.cc", src_line=1231, that=0x5638ebf68b00 <buf_pool>) at ./include/mysql/psi/mysql_thread.h:752 No locals. #3 buf_pool_t::corrupted_evict (this=0x5638ebf68b00 <buf_pool>, bpage=0x5639908a14e8, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1231 id = {m_id = <optimized out>} chain = @0x5638f1ceca18: {first = 0x56396e8cc258} hash_lock = @0x5638f1ceca00: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 0}, <No data fields>}, static WRITER = 2147483648}} unfix = <optimized out> s = <optimized out> #4 0x00005638eb5271d5 in buf_page_t::read_complete (this=0x5639908a14e8, node=@0x5638f0b3e1d8: {space = 0x5638f0b3d8d8, name = 0x5638f0b1f4b8 "./atoz/c_log#P#p13.ibd", handle = {m_file = 262, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 198656, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3627 database_corrupted = <optimized out> expected_id = {m_id = 452504869516550} read_frame = <optimized out> err = DB_FAIL recovery = <optimized out> ibuf_may_exist = <optimized out> The page ID (452504869516550 == 0x19b8d0001b506) is almost identical with what we have in another thread, but it does not seem to share the buf_pool.page_hash latch with it (0x5638f1ceca00 is not 0x5638f1cec900): mariadb-10.11.5 Thread 58 (Thread 0x7fc87f573700 (LWP 3738762)): … #6 sux_lock<ssux_lock_impl<true> >::s_lock (this=0x563973caed20) at ./storage/innobase/include/sux_lock.h:362 No locals. #7 buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2233 discard_attempted = <optimized out> chain = @0x5638f1cec928: {first = 0x563973caed08} hash_lock = @0x5638f1cec900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}} bpage = 0x563973caed08 By searching for 0x5638f1cec900 I found a thread that seems to be a culprit in this deadlock: mariadb-10.11.5 Thread 139 (Thread 0x7fc8762d6700 (LWP 3435180)): … #4 page_hash_latch::lock (this=0x5638f1cec900) at ./storage/innobase/include/buf0types.h:214 No locals. #5 buf_pool_t::corrupted_evict (this=0x5638ebf68b00 <buf_pool>, bpage=0x563969c1c648, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1232 id = {m_id = <optimized out>} chain = @0x5638f1cec920: {first = 0x563969c1c648} hash_lock = @0x5638f1cec900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}} unfix = <optimized out> s = <optimized out> #6 0x00005638eb5271d5 in buf_page_t::read_complete (this=0x563969c1c648, node=@0x5638f0b3e1d8: {space = 0x5638f0b3d8d8, name = 0x5638f0b1f4b8 "./atoz/c_log#P#p13.ibd", handle = {m_file = 262, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 198656, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3627 database_corrupted = <optimized out> expected_id = {m_id = 452504869516545} read_frame = <optimized out> err = DB_FAIL recovery = <optimized out> ibuf_may_exist = <optimized out> Bingo, this thread is actually holding buf_pool.mutex (blocking a lot of threads), and waiting for the page hash latch that Thread 58 is holding while holding the page latch that Thread 58 is waiting for: mysql_mutex_lock(&mutex); hash_lock.lock();   ut_ad(!bpage->oldest_modification()); bpage->set_corrupt_id(); auto unfix= state - buf_page_t::FREED; auto s= bpage->zip.fix.fetch_sub(unfix) - unfix; bpage->lock.x_unlock( true ); The function buf_page_get_zip() needs to be revised so that it will not wait for the page latch while holding the buf_pool.page_hash latch. This function is fetching BLOB pages of ROW_FORMAT=COMPRESSED tables.
            marko Marko Mäkelä made changes -
            Summary innodb_fatal_semaphore_wait_threshold warnings not happening Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() on InnoDB ROW_FORMAT=COMPRESSED table corruption
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.3 [ 28565 ]
            Affects Version/s 10.6 [ 24028 ]
            Affects Version/s 10.7 [ 24805 ]
            Affects Version/s 10.8 [ 26121 ]
            Affects Version/s 10.9 [ 26905 ]
            Affects Version/s 10.10 [ 27530 ]
            Affects Version/s 10.11 [ 27614 ]
            Affects Version/s 11.0 [ 28320 ]
            Affects Version/s 11.1 [ 28549 ]
            Affects Version/s 11.2 [ 28603 ]
            Affects Version/s 11.3 [ 28565 ]
            Labels hang
            Priority Major [ 3 ] Critical [ 2 ]
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Status Confirmed [ 10101 ] In Review [ 10002 ]

            origin/10.6-MDEV-32371 963a88be2e89100260e09e46e90e2b3271444899 2023-11-23T12:11:39+02:00
            performed well in RQG testing.

            mleich Matthias Leich added a comment - origin/10.6- MDEV-32371 963a88be2e89100260e09e46e90e2b3271444899 2023-11-23T12:11:39+02:00 performed well in RQG testing.
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2023-11-30 10:01:43.0 2023-11-30 10:01:43.1
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.17 [ 29518 ]
            Fix Version/s 10.11.7 [ 29519 ]
            Fix Version/s 11.0.5 [ 29520 ]
            Fix Version/s 11.1.4 [ 29024 ]
            Fix Version/s 11.2.3 [ 29521 ]
            Fix Version/s 11.3.2 [ 29522 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]

            xan@biblionix.com, in case you are interested, at https://ci.mariadb.org/41154/ you can find a packages of the current 10.11 development branch, which includes a fix of this bug.

            marko Marko Mäkelä added a comment - xan@biblionix.com , in case you are interested, at https://ci.mariadb.org/41154/ you can find a packages of the current 10.11 development branch, which includes a fix of this bug.

            Marko, thank you very much!

            xan@biblionix.com Xan Charbonnet added a comment - Marko, thank you very much!

            @marko, I ended up reccompiling the Debian package after making your modification. Have not seen a hang since. Thanks!

            xan@biblionix.com Xan Charbonnet added a comment - @marko, I ended up reccompiling the Debian package after making your modification. Have not seen a hang since. Thanks!

            I think we got caught with this on Debian bookworm during backup of databases with compressed pages - thanks for the fix. Put on the version from testing to get it.

            GreenReaper Laurence Parry added a comment - I think we got caught with this on Debian bookworm during backup of databases with compressed pages - thanks for the fix. Put on the version from testing to get it.

            People

              marko Marko Mäkelä
              xan@biblionix.com Xan Charbonnet
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.