[MDEV-30791] killed query shows in processlist forever Created: 2023-03-06  Updated: 2023-03-07  Resolved: 2023-03-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.9
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2

Type: Bug Priority: Major
Reporter: Allen Lee (Inactive) Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 1
Labels: None

Issue Links:
Duplicate
duplicates MDEV-27983 InnoDB hangs on multiple concurrent r... Closed

 Description   

There are 2 problems:

1. SET STATEMENT max_statement_time=XXX select .. query did not kill itself once it passed max_statement_time.

2. show proceslist shows above session as killed, but innodb status shows it is running.



 Comments   
Comment by Marko Mäkelä [ 2023-03-07 ]

Here is the relevant part of the stack trace output:

mariadb-10.6.9

Thread 207 (Thread 0x7f1c8c0c0700 (LWP 6705)):
#0  0x00007f301837de9d in nanosleep () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x0000559f7b818b49 in sleep_for<long, std::ratio<1, 1000000> > (__rtime=<optimized out>, __rtime=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:401
        __ts = {tv_sec = 0, tv_nsec = 89937}
#2  buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) () at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2584

Even though it looks like the debug information for mariadbd was not installed (we can see it being installed for {{libstdc++ in frame 1 above), in this case all we need to know is the line number. This hang is a duplicate of MDEV-27983 that affects MariaDB 10.6.6, 10.6.7, 10.6.8, 10.6.9. Let me paste the code:

        if (UNIV_UNLIKELY(!block->page.frame)) {
                if (!block->page.lock.x_lock_try()) {
                        /* The page is being read or written, or
                        another thread is executing buf_zip_decompress()
                        in buf_page_get_low() on it. */
                        block->page.unfix();
                        std::this_thread::sleep_for(
                                std::chrono::microseconds(100));
                        goto loop;
                }

At least one thread will remain blocked and one in an infinite loop. Among the stack traces, we have several threads waiting in buf0buf.cc:2584 (possibly on different page latches; the information is not available), at least 2 in buf0buf.cc:2536 and some in buf0buf.cc:2630. Let us check those lines as well:

                /* A read-fix is released after block->page.lock
                in buf_page_t::read_complete() or
                buf_pool_t::corrupted_evict(), or
                after buf_zip_decompress() in this function. */
2536            block->page.lock.s_lock();
                state = block->page.state();
                ut_ad(state < buf_page_t::READ_FIX
                      || state >= buf_page_t::WRITE_FIX);
                const page_id_t id{block->page.id()};
                block->page.lock.s_unlock();

                        mysql_mutex_unlock(&buf_pool.mutex);
                        hash_lock.unlock();
2630                    std::this_thread::sleep_for(
                                std::chrono::microseconds(100));
                        goto wait_for_unfix;

The fix of MDEV-27983 affects the sleeping thread (line 2536). Instead of waiting for the page latch that one of the busy-waiting threads is holding, that thread will start from the scratch, acquiring a lock on the buf_pool.page_hash cell that covers the desired page.

Generated at Thu Feb 08 10:18:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.