Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30791

killed query shows in processlist forever

Details

    Description

      There are 2 problems:

      1. SET STATEMENT max_statement_time=XXX select .. query did not kill itself once it passed max_statement_time.

      2. show proceslist shows above session as killed, but innodb status shows it is running.

      Attachments

        Issue Links

          Activity

            Here is the relevant part of the stack trace output:

            mariadb-10.6.9

            Thread 207 (Thread 0x7f1c8c0c0700 (LWP 6705)):
            #0  0x00007f301837de9d in nanosleep () from /lib64/libpthread.so.0
            No symbol table info available.
            #1  0x0000559f7b818b49 in sleep_for<long, std::ratio<1, 1000000> > (__rtime=<optimized out>, __rtime=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:401
                    __ts = {tv_sec = 0, tv_nsec = 89937}
            #2  buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) () at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2584
            

            Even though it looks like the debug information for mariadbd was not installed (we can see it being installed for {{libstdc++ in frame 1 above), in this case all we need to know is the line number. This hang is a duplicate of MDEV-27983 that affects MariaDB 10.6.6, 10.6.7, 10.6.8, 10.6.9. Let me paste the code:

                    if (UNIV_UNLIKELY(!block->page.frame)) {
                            if (!block->page.lock.x_lock_try()) {
                                    /* The page is being read or written, or
                                    another thread is executing buf_zip_decompress()
                                    in buf_page_get_low() on it. */
                                    block->page.unfix();
                                    std::this_thread::sleep_for(
                                            std::chrono::microseconds(100));
                                    goto loop;
                            }
            

            At least one thread will remain blocked and one in an infinite loop. Among the stack traces, we have several threads waiting in buf0buf.cc:2584 (possibly on different page latches; the information is not available), at least 2 in buf0buf.cc:2536 and some in buf0buf.cc:2630. Let us check those lines as well:

                            /* A read-fix is released after block->page.lock
                            in buf_page_t::read_complete() or
                            buf_pool_t::corrupted_evict(), or
                            after buf_zip_decompress() in this function. */
            2536            block->page.lock.s_lock();
                            state = block->page.state();
                            ut_ad(state < buf_page_t::READ_FIX
                                  || state >= buf_page_t::WRITE_FIX);
                            const page_id_t id{block->page.id()};
                            block->page.lock.s_unlock();
            

                                    mysql_mutex_unlock(&buf_pool.mutex);
                                    hash_lock.unlock();
            2630                    std::this_thread::sleep_for(
                                            std::chrono::microseconds(100));
                                    goto wait_for_unfix;
            

            The fix of MDEV-27983 affects the sleeping thread (line 2536). Instead of waiting for the page latch that one of the busy-waiting threads is holding, that thread will start from the scratch, acquiring a lock on the buf_pool.page_hash cell that covers the desired page.

            marko Marko Mäkelä added a comment - Here is the relevant part of the stack trace output: mariadb-10.6.9 Thread 207 (Thread 0x7f1c8c0c0700 (LWP 6705)): #0 0x00007f301837de9d in nanosleep () from /lib64/libpthread.so.0 No symbol table info available. #1 0x0000559f7b818b49 in sleep_for<long, std::ratio<1, 1000000> > (__rtime=<optimized out>, __rtime=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:401 __ts = {tv_sec = 0, tv_nsec = 89937} #2 buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) () at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2584 Even though it looks like the debug information for mariadbd was not installed (we can see it being installed for {{libstdc++ in frame 1 above), in this case all we need to know is the line number. This hang is a duplicate of MDEV-27983 that affects MariaDB 10.6.6, 10.6.7, 10.6.8, 10.6.9. Let me paste the code: if (UNIV_UNLIKELY(!block->page.frame)) { if (!block->page.lock.x_lock_try()) { /* The page is being read or written, or another thread is executing buf_zip_decompress() in buf_page_get_low() on it. */ block->page.unfix(); std::this_thread::sleep_for( std::chrono::microseconds(100)); goto loop; } At least one thread will remain blocked and one in an infinite loop. Among the stack traces, we have several threads waiting in buf0buf.cc:2584 (possibly on different page latches; the information is not available), at least 2 in buf0buf.cc:2536 and some in buf0buf.cc:2630. Let us check those lines as well: /* A read-fix is released after block->page.lock in buf_page_t::read_complete() or buf_pool_t::corrupted_evict(), or after buf_zip_decompress() in this function. */ 2536 block->page.lock.s_lock(); state = block->page.state(); ut_ad(state < buf_page_t::READ_FIX || state >= buf_page_t::WRITE_FIX); const page_id_t id{block->page.id()}; block->page.lock.s_unlock(); mysql_mutex_unlock(&buf_pool.mutex); hash_lock.unlock(); 2630 std::this_thread::sleep_for( std::chrono::microseconds(100)); goto wait_for_unfix; The fix of MDEV-27983 affects the sleeping thread (line 2536). Instead of waiting for the page latch that one of the busy-waiting threads is holding, that thread will start from the scratch, acquiring a lock on the buf_pool.page_hash cell that covers the desired page.

            People

              marko Marko Mäkelä
              allen.lee@mariadb.com Allen Lee (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.