[MXS-2672] MaxScale 2.4.2 keep crashing Created: 2019-09-11  Updated: 2019-10-10  Resolved: 2019-10-10

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 2.4.2
Fix Version/s: 2.4.3

Type: Bug Priority: Critical
Reporter: Hau Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.6


Sprint: MXS-SPRINT-92

 Description   

Hi,

Can anyone please help us? Our MaxScale installation keep crashing...

Below are the logs...

2019-09-11 12:23:41   alert  : (4554) Fatal: MaxScale 2.4.2 received fatal signal 11. Attempting backtrace.
2019-09-11 12:23:41   alert  : (4554) Commit ID: aad4148d77bf2dfbaa0042bc45abda30c101cad2 System name: Linux Release string: NAME="CentOS Linux"
2019-09-11 12:23:41   alert  : (4554)   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_Z12gwbuf_appendP5GWBUFS0_+0x10): server/core/buffer.cc:509
2019-09-11 12:23:41   alert  : (4554)   /usr/lib64/maxscale/libmariadbbackend.so(+0x4ac7): include/maxscale/dcb.hh:345
2019-09-11 12:23:41   alert  : (4554)   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x9d487): server/core/dcb.cc:2701
2019-09-11 12:23:41   alert  : (4554)   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x9d511): server/core/dcb.cc:2793
2019-09-11 12:23:41   alert  : (4554)   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x196): maxutils/maxbase/src/worker.cc:858
2019-09-11 12:23:41   alert  : (4554)   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:559
2019-09-11 12:23:41   alert  : (4554)   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x1ad83f): thread48.o:?
2019-09-11 12:23:41   alert  : (4554)   /lib64/libpthread.so.0(+0x7dd5): pthread_create.c:?
2019-09-11 12:23:41   alert  : (4554)   /lib64/libc.so.6(clone+0x6d): ??:?

Below are our MaxScale config

[maxscale]
threads=auto
 
[dbprod01]
type=server
address=xxxxxxxxxx
port=3306
protocol=MariaDBBackend
 
[dbprod03]
type=server
address=xxxxxxxxxx
port=3306
protocol=MariaDBBackend
 
[dbprod04]
type=server
address=xxxxxxxxxx
port=3306
protocol=MariaDBBackend
 
[Galera-Monitor]
type=monitor
module=galeramon
servers=dbprod01,dbprod03,dbprod04
user=monitor_user
password=xxxxxxxxxx
monitor_interval=2000
 
[Splitter-Service]
type=service
router=readwritesplit
servers=dbprod01,dbprod03,dbprod04
user=maxscale
password=xxxxxxxxxx
max_sescmd_history=1500
 
[Splitter-Listener]
type=listener
service=Splitter-Service
protocol=MariaDBClient
port=3306



 Comments   
Comment by Hau [ 2019-09-11 ]

The previous error happens every 1-2minutes and sometimes the error becomes like this... Eventhough when checking the server status the Master is already set by galeramon.

Please help...

2019-09-11 15:09:15 notice : Service 'Splitter-Service' started (1/1)
2019-09-11 15:09:15 error : (1) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:15 error : (1) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:15 error : (2) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:15 error : (2) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:15 error : (3) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:15 error : (3) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:16 error : (4) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:16 error : (4) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:16 error : (5) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:16 error : (5) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:16 error : (6) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:16 error : (6) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:16 error : (7) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:16 error : (7) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:16 error : (8) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:16 error : (8) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:16 error : (9) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-09-11 15:09:16 error : (9) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details.
2019-09-11 15:09:16 error : (10) [readwritesplit] Couldn't find suitable Master from 3 candidates. (subsequent similar messages suppressed for 10000 milliseconds)
2019-09-11 15:09:16 error : (10) Failed to create new router session for service 'Splitter-Service'. See previous errors for more details. (subsequent similar messages suppressed for 10000 milliseconds)
2019-09-11 15:09:16 notice : Loaded server states from journal file: /var/lib/maxscale/Galera-Monitor/monitor.dat
2019-09-11 15:09:16 alert : (190) Fatal: MaxScale 2.4.2 received fatal signal 11. Attempting backtrace.
2019-09-11 15:09:16 alert : (190) Commit ID: aad4148d77bf2dfbaa0042bc45abda30c101cad2 System name: Linux Release string: NAME="CentOS Linux"
2019-09-11 15:09:16 alert : (190) /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(Z12gwbuf_appendP5GWBUFS0+0x10): server/core/buffer.cc:509
2019-09-11 15:09:16 alert : (190) /usr/lib64/maxscale/libmariadbbackend.so(+0x4ac7): include/maxscale/dcb.hh:345
2019-09-11 15:09:16 alert : (190) /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x9d487): server/core/dcb.cc:2701
2019-09-11 15:09:16 alert : (190) /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x9d511): server/core/dcb.cc:2793
2019-09-11 15:09:16 alert : (190) /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x196): maxutils/maxbase/src/worker.cc:858
2019-09-11 15:09:16 alert : (190) /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:559
2019-09-11 15:09:16 alert : (190) /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x1ad83f): thread48.o:?
2019-09-11 15:09:16 alert : (190) /lib64/libpthread.so.0(+0x7dd5): pthread_create.c:?
2019-09-11 15:09:16 alert : (190) /lib64/libc.so.6(clone+0x6d): ??:?

Comment by markus makela [ 2019-09-12 ]

The stack trace looks to end up in gwbuf_append where the manipulation of the tail pointer causes a crash.

    head->tail->next = tail;
    head->tail = tail->tail;

Comment by markus makela [ 2019-10-10 ]

Fixed by 9b0b1521095686581c84b8b8bea649225290d17a.

Generated at Thu Feb 08 04:15:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.