[MXS-2657] maxscale crash in worker Created: 2019-09-03  Updated: 2020-07-03  Resolved: 2020-07-03

Status: Closed
Project: MariaDB MaxScale
Component/s: N/A
Affects Version/s: 2.3.12, 2.4.1
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Kyle Joiner (Inactive) Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 1
Labels: None


 Description   

Crash under heavy insert load:

2019-09-03 14:23:46 info : (1) [readwritesplit] Reply complete, last reply from sctst6
2019-09-03 14:23:50 info : (1) > Autocommit: [disabled], trx is [open], cmd: (0x01) COM_QUIT, plen: 5, type: QUERY_TYPE_SESSION_WRITE, stmt:
2019-09-03 14:23:50 info : (1) [readwritesplit] Session write, routing to all servers.
2019-09-03 14:23:50 info : (1) [readwritesplit] Route query to master: sctst6 [192.168.212.44]:3306
2019-09-03 14:23:50 info : Stopped Read-Write-Dev client session [1]
2019-09-03 14:23:51 info : (2) [readwritesplit] Servers and router connection counts:
2019-09-03 14:23:51 info : (2) [readwritesplit] current operations : 0 in [192.168.212.44]:3306 Master, Running
2019-09-03 14:23:51 info : (2) [readwritesplit] Connected to 'sctst6'
2019-09-03 14:23:51 info : (2) [readwritesplit] Selected Master: sctst6
2019-09-03 14:23:51 info : (2) Started Read-Write-Dev client session [2] for 'plinel' from ::ffff:192.168.76.95
2019-09-03 14:23:51 info : (2) Connected to 'XXXXXXX' with thread id 17863
2019-09-03 14:23:51 info : (2) > Autocommit: [disabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 21, type: QUERY_TYPE_GSYSVAR_WRITE|QUERY_TYPE_BEGIN_TRX|QUERY_TYPE_DISABLE_AUTOCOMMIT, stmt: set autocommit=0
2019-09-03 14:23:51 info : (2) [readwritesplit] Session write, routing to all servers.
2019-09-03 14:23:51 info : (2) [readwritesplit] Route query to master: sctst6 [192.168.212.44]:3306
2019-09-03 14:23:51 info : (2) [readwritesplit] Reply complete, last reply from sctst6
2019-09-03 14:24:42 alert : Fatal: MaxScale 2.4.1 received fatal signal 6. Attempting backtrace.
2019-09-03 14:24:42 alert : Commit ID: b47b7c15edbbc70d9137520e3c590c4f16b3a4eb System name: Linux Release string: CentOS Linux release 7.6.1810 (Core)
2019-09-03 14:24:42 alert : /lib64/libc.so.6(epoll_wait+0x33): :?
2019-09-03 14:24:42 alert : /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0xd0): maxutils/maxbase/src/worker.cc:795
2019-09-03 14:24:42 alert : /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:559
2019-09-03 14:24:42 alert : /usr/bin/maxscale(main+0x2a76): server/core/gateway.cc:2267
2019-09-03 14:24:42 alert : /lib64/libc.so.6(__libc_start_main+0xf5): ??:?
2019-09-03 14:24:42 alert : /usr/bin/maxscale(): ??:?



 Comments   
Comment by Johan Wikman [ 2019-09-18 ]

Since MaxScale was killed with signal 6 when it was sitting in epoll_wait() a (likely) possibility is that it was the systemd watchdog that killed MaxScale.

https://mariadb.com/kb/en/mariadb-maxscale-24-mariadb-maxscale-configuration-guide/#systemd-watchdog

Comment by markus makela [ 2019-10-07 ]

If you edit /lib/systemd/system/maxscale.service and change the value from 60 to something larger, does it help?

Comment by markus makela [ 2020-03-02 ]

kjoiner is this still relevant?

Comment by markus makela [ 2020-07-03 ]

Closing as Cannot Reproduce as this is most likely caused by an actually slow server in which case the watchdog timing values need to be adjusted. If it isn't that, we need to know whether it still happens with the latest releases.

Generated at Thu Feb 08 04:15:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.