[MXS-3775] Hang in RoutingWorker::execute_concurrently Created: 2021-09-20  Updated: 2022-05-05  Resolved: 2021-09-21

Status: Closed
Project: MariaDB MaxScale
Component/s: Protocol
Affects Version/s: 2.5.15, 6.1.1
Fix Version/s: 6.1.2

Type: Bug Priority: Major
Reporter: Nilnandan Joshi Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MXS-3886 Hang in RoutingWorker::execute_concur... Closed

 Description   

2021-09-19 02:08:36   info   : (21821) [readwritesplit] (split-router); Reply complete from 'server2', discarding it.
alert  : MaxScale 6.1.1 received fatal signal 6. Commit ID: 9b4abbc9bd86e6f9b6c3fc581214eb8583d959e4 System name: Linux Release string: Ubuntu 20.04.2 LTS
 
 
2021-09-19 02:08:36   alert  : MaxScale 6.1.1 received fatal signal 6. Commit ID: 9b4abbc9bd86e6f9b6c3fc581214eb8583d959e4 System name: Linux Release string: Ubuntu 20.04.2 LTS
2021-09-19 02:08:36   alert  : Statement currently being classified: none/unknown
2021-09-19 02:08:36   info   : (21821) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 1134, type: QUERY_TYPE_READ, stmt: select ...<query>
2021-09-19 02:08:36   info   : (21821) [readwritesplit] (split-router); Route query to slave: server2 <
2021-09-19 02:08:36   info   : (21821) [readwritesplit] (split-router); Reply complete from 'server2'
alert  :   /lib/x86_64-linux-gnu/libpthread.so.0(+0x133f4) [0x7f0defd393f4]: <binutils not installed>
  /lib/x86_64-linux-gnu/libpthread.so.0(+0x134e8) [0x7f0defd394e8]: <binutils not installed>
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale13RoutingWorker20execute_concurrentlyERKSt8functionIFvvEE+0x62) [0x7f0df06249f2]: <binutils not installed>
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x1bdfbc) [0x7f0df0634fbc]: <binutils not installed>
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker14handle_messageERNS_12MessageQueueERKNS_19MessageQueueMessageE+0x6e) [0x7f0df06b743e]: <binutils not installed>
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase12MessageQueue18handle_poll_eventsEPNS_6WorkerEj+0xc0) [0x7f0df06ba570]: <binutils not installed>
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x233) [0x7f0df06b8823]: <binutils not installed>
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53) [0x7f0df06b8a43]: <binutils not installed>
  /usr/bin/maxscale(main+0x1a8a) [0x55eb41d52d5a]: <binutils not installed>
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f0defb400b3]: <binutils not installed>
  /usr/bin/maxscale(_start+0x2e) [0x55eb41d5367e]: <binutils not installed>
alert  : Writing core dump.



 Comments   
Comment by markus makela [ 2021-09-20 ]

The relevant part of code where it hangs seems to be this:

/usr/include/c++/9/bits/std_function.h:259
  1bdfbc:       48 8b 45 b0             mov    -0x50(%rbp),%rax
  1bdfc0:       48 85 c0                test   %rax,%rax
  1bdfc3:       74 0d                   je     1bdfd2 <std::_Function_handler<void (), Server::set_gtid_list(std::vector<std::pair<unsigned int, unsigned long>, std::allocator<std::pair<unsigned int, unsigned long> > > const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x342>
/usr/include/c++/9/bits/std_function.h:260

This suggests that there's a deadlock somewhere that causes a RoutingWorker to wait on the MainWorker that waits on all the RoutingWorkers in Server::set_gtid_list.

Comment by markus makela [ 2021-09-21 ]

Managed to reproduce this by running a modified version of the crash_on_bad_sescmd test.

Comment by Richard Stracke [ 2021-11-18 ]

It seems, if 2.5.15 is also affected

alert  : MaxScale 2.5.15 received fatal signal 6. Commit ID: 2ddd29138544e9ff3071bd9d2d253dc82291b0f9 System name: Linux Release string: NAME="CentOS Linux"
 
 
2021-11-17 05:48:08   alert  : MaxScale 2.5.15 received fatal signal 6. Commit ID: 2ddd29138544e9ff3071bd9d2d253dc82291b0f9 System name: Linux Release string: NAME="CentOS Linux"
2021-11-17 05:48:08   alert  : Statement currently being classified: none/unknown
alert  :   /lib64/libpthread.so.0(+0xdb3b): sem_wait.c:?
  /lib64/libpthread.so.0(+0xdbcf): sem_wait.c:?
  /lib64/libpthread.so.0(sem_wait+0x2b): ??:0
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale13RoutingWorker20execute_concurrentlyESt8functionIFvvEE+0x62): maxutils/maxbase/include/maxbase/semaphore.hh:146
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x15448a): /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:275
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker14handle_messageERNS_12MessageQueueERKNS_19MessageQueueMessageE+0x159): maxutils/maxbase/src/worker.cc:490
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase12MessageQueue18handle_poll_eventsEPNS_6WorkerEj+0x138): maxutils/maxbase/src/messagequeue.cc:307
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1be): maxutils/maxbase/src/worker.cc:879
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:574
  /usr/bin/maxscale(main+0x1f7f): maxutils/maxbase/include/maxbase/log.h:168
  /lib64/libc.so.6(__libc_start_main+0xf5): ??:?
  /usr/bin/maxscale(): ??:?
 
 
2021-11-17 05:48:09   alert  : 
  /lib64/libpthread.so.0(+0xdb3b): sem_wait.c:?
  /lib64/libpthread.so.0(+0xdbcf): sem_wait.c:?
  /lib64/libpthread.so.0(sem_wait+0x2b): ??:0
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale13RoutingWorker20execute_concurrentlyESt8functionIFvvEE+0x62): maxutils/maxbase/include/maxbase/semaphore.hh:146
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x15448a): /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:275
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker14handle_messageERNS_12MessageQueueERKNS_19MessageQueueMessageE+0x159): maxutils/maxbase/src/worker.cc:490
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase12MessageQueue18handle_poll_eventsEPNS_6WorkerEj+0x138): maxutils/maxbase/src/messagequeue.cc:307
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1be): maxutils/maxbase/src/worker.cc:879
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:574
  /usr/bin/maxscale(main+0x1f7f): maxutils/maxbase/include/maxbase/log.h:168
  /lib64/libc.so.6(__libc_start_main+0xf5): ??:?
  /usr/bin/maxscale(): ??:?
alert  : Writing core dump.
 

Generated at Thu Feb 08 04:23:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.