Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
5.5.34
-
None
-
None
-
linux
Description
jira would not let me enter 5.5.30, so i used 5.5.34 instead.
we have a customer that has observed problems related to the
interaction of the thread pools in mariadb 5.5.30 and tokudb 7.1.0.
the problem is that connections that stuck for 100's of seconds in the
Killed state when a big tokudb transaction commit is in progress. the
customer claims that this problem was resolved by turning the thread
pool OFF. have there been any fixes to the thread pool implementation
post the 5.5.30 release?
here is some information that may be useful.
the processlist showed 2838 client connections of which:
851 Killed
483 Sleep
1467 Connect
1 show processlist
1 processing a commit on a big delete from a tokudb table
35 blocked on a row lock held by the big delete
the Killed connections look like:
84847752 sfi_mysql 10.0.0.60:1814 sfi Killed 93 NULL 0.000
with the connection address and time slightly different
a gdb snapshot of the system at the time of the failure:
291 total threads
130 threads waiting on fil_aio_wait
52 threads waiting on tokudb's work_on_kibbutz
16 thread tokudb deserialization thread waiting
36 mysql update thread blocked on the tokudb lock tree
1thread executing a big tokudb delete transaction
40 threads in the thread pool get_event function
1 tokudb checkpoint thread blocked waiting for the LP MO lock, which
is held by the big txn commit thread
that leaves 15 misc worker threads that are not doing anything interesting
the gdb stacks for the thread pool threads are:
Thread 78 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 71 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 70 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 69 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 68 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 62 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 57 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 52 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 51 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 49 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 45 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 44 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 43 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 42 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 41 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 39 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 37 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 36 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 35 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 33 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 32 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 31 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 26 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 24 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 22 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 21 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 20 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 19 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 18 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 17 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 16 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 13 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 12 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 11 : epoll_wait , io_poll_wait , listener , get_event ,
worker_main , start_thread , clone
Thread 8 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 7 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 5 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 4 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 3 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
Thread 2 : pthread_cond_timedwait@@GLIBC_2.3.2 ,
inline_mysql_cond_timedwait , get_event , worker_main , start_thread ,
clone
thanks
guangpu feng gpfeng.cs@gmail.com
Jan 16 (6 days ago)
to me
We have encountered this problem before and it does't happy again after we fixed it, when we start mysqld with threadpool(ported from mariadb5.5.28 to our percona server 5.5.18), many killed sessions remained in "show processlist" after a month, the backstrace from pt-pmt is just like yours. I don't konw whether we have exactly the same problem, but I can tell why we have the problem and how we solved it:
in THD::awker, close sock will result in epoll_wait unregistering sockfd, which will prevent killed connections from exiting when pool-of-threads scheduler is used, because epoll_wait will never return for that connection.
here is the solution: just shutdown the socket, and let close_connection which will be called later to close sockfd.
Index: /PS5518/branches/threadpool/sql/sql_class.cc
|
===================================================================
|
--- /PS5518/branches/threadpool/sql/sql_class.cc (revision 3788)
|
+++ /PS5518/branches/threadpool/sql/sql_class.cc (revision 3823)
|
@@ -1746,7 +1746,8 @@
|
reading the next statement.
|
*/
|
|
- close_active_vio();
|
+ if (active_vio)
|
+ vio_shutdown(active_vio, SHUT_RDWR);
|
}
|
#endif
|
Hope it will be helpful.