Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5551

thread pool leaves lots of killed connections

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 5.5.34
    • Fix Version/s: 5.5.38
    • Component/s: None
    • Labels:
      None
    • Environment:
      linux

      Description

      jira would not let me enter 5.5.30, so i used 5.5.34 instead.

      we have a customer that has observed problems related to the
      interaction of the thread pools in mariadb 5.5.30 and tokudb 7.1.0.
      the problem is that connections that stuck for 100's of seconds in the
      Killed state when a big tokudb transaction commit is in progress. the
      customer claims that this problem was resolved by turning the thread
      pool OFF. have there been any fixes to the thread pool implementation
      post the 5.5.30 release?

      here is some information that may be useful.

      the processlist showed 2838 client connections of which:
      851 Killed
      483 Sleep
      1467 Connect
      1 show processlist
      1 processing a commit on a big delete from a tokudb table
      35 blocked on a row lock held by the big delete

      the Killed connections look like:
      84847752 sfi_mysql 10.0.0.60:1814 sfi Killed 93 NULL 0.000
      with the connection address and time slightly different

      a gdb snapshot of the system at the time of the failure:
      291 total threads
      130 threads waiting on fil_aio_wait
      52 threads waiting on tokudb's work_on_kibbutz
      16 thread tokudb deserialization thread waiting
      36 mysql update thread blocked on the tokudb lock tree
      1thread executing a big tokudb delete transaction
      40 threads in the thread pool get_event function
      1 tokudb checkpoint thread blocked waiting for the LP MO lock, which
      is held by the big txn commit thread
      that leaves 15 misc worker threads that are not doing anything interesting

      the gdb stacks for the thread pool threads are:
      Thread 78 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 71 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 70 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 69 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 68 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 62 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 57 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 52 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 51 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 49 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 45 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 44 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 43 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 42 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 41 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 39 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 37 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 36 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 35 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 33 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 32 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 31 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 26 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 24 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 22 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 21 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 20 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 19 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 18 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 17 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 16 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 13 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 12 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 11 : epoll_wait , io_poll_wait , listener , get_event ,
      worker_main , start_thread , clone
      Thread 8 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 7 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 5 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 4 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 3 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone
      Thread 2 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
      inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
      clone

      thanks

      guangpu feng gpfeng.cs@gmail.com
      Jan 16 (6 days ago)

      to me
      We have encountered this problem before and it does't happy again after we fixed it, when we start mysqld with threadpool(ported from mariadb5.5.28 to our percona server 5.5.18), many killed sessions remained in "show processlist" after a month, the backstrace from pt-pmt is just like yours. I don't konw whether we have exactly the same problem, but I can tell why we have the problem and how we solved it:

      in THD::awker, close sock will result in epoll_wait unregistering sockfd, which will prevent killed connections from exiting when pool-of-threads scheduler is used, because epoll_wait will never return for that connection.

      here is the solution: just shutdown the socket, and let close_connection which will be called later to close sockfd.

      Index: /PS5518/branches/threadpool/sql/sql_class.cc
      ===================================================================
      --- /PS5518/branches/threadpool/sql/sql_class.cc	(revision 3788)
      +++ /PS5518/branches/threadpool/sql/sql_class.cc	(revision 3823)
      @@ -1746,7 +1746,8 @@
               reading the next statement.
             */
       
      -      close_active_vio();
      +      if (active_vio)
      +        vio_shutdown(active_vio, SHUT_RDWR);
           }
       #endif

      Hope it will be helpful.

        Attachments

          Activity

            People

            Assignee:
            svoj Sergey Vojtovich
            Reporter:
            prohaska7 Rich Prohaska
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration