[MXS-2487] MaxScale FATALs on running multiple parallel inserts on Clustrix nodes with readwritesplit router config Created: 2019-05-14  Updated: 2020-12-08  Resolved: 2019-06-25

Status: Closed
Project: MariaDB MaxScale
Component/s: xpandmon
Affects Version/s: None
Fix Version/s: 2.4.0

Type: Bug Priority: Major
Reporter: Rahul Joshi (Inactive) Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

maxscale build on develop branch (b4e8f79c5f985a184e1fb35a0aa34de34cc3e921)
Clustrix master node details:

[root@karma051 ~]# clx s
Cluster Name:    cl818b67edd0911482
Cluster Version: clustrix-elk-14841
Cluster Status:   OK
Cluster Size:    3 nodes - 16 CPUs per Node
Current Node:    karma051 - nid 1
 
nid |  Hostname | Status |  IP Address  | TPS |      Used      |  Total
----+-----------+--------+--------------+-----+----------------+--------
  1 |  karma051 |    OK  |  10.2.15.139 |   0 |  33.7G (4.39%) |  767.0G
  2 |  karma074 |    OK  |  10.2.12.210 |   0 |  30.8G (4.01%) |  767.0G
  3 |  karma063 |    OK  |  10.2.14.115 |   0 |  28.4G (3.70%) |  767.0G
----+-----------+--------+--------------+-----+----------------+--------
                                            0 |  92.8G (4.03%) |    2.2T


Attachments: Text File maxscale_MXS-2487.log    
Sprint: MXS-SPRINT-85

 Description   

MaxScale config file:

[maxscale]
threads=auto
log_info=1
logdir=/data/clustrix/log
 
[BS1]
type=server
address=10.2.15.139
port=3306
protocol=MariaDBBackend
 
[Cluster-Monitor]
type=monitor
module=clustrixmon
servers=BS1
user=maxscale
password=maxscale_pw
monitor_interval=10000
 
[Read-Write-Service]
type=service
router=readwritesplit
user=maxscale
password=maxscale_pw
cluster=Cluster-Monitor
transaction_replay=true
delayed_retry_timeout=100000
 
[Read-Write-Listener]
type=listener
service=Read-Write-Service
protocol=MariaDBClient
port=4006
 
###############
 
 
[BS4]
type=server
address=10.2.15.191
port=3306
protocol=MariaDBBackend
 
[Cluster-Monitor1]
type=monitor
module=clustrixmon
servers=BS4
user=maxscale
password=maxscale_pw
monitor_interval=10000
 
[Read-Write-Service1]
type=service
router=readwritesplit
user=maxscale
password=maxscale_pw
cluster=Cluster-Monitor1
transaction_replay=true
delayed_retry_timeout=100000
 
[Read-Write-Listener1]
type=listener
service=Read-Write-Service1
protocol=MariaDBClient
port=4007
 
##################
 
[MBS]
type=server
address=127.0.0.1
port=4006
protocol=MariaDBBackend
 
[MBS2]
type=server
address=127.0.0.1
port=4007
protocol=MariaDBBackend
 
[Read-Write-Service_rwsrvc]
type=service
router=readwritesplit
servers=MBS,MBS2
user=maxscale
password=maxscale_pw
#retry_on_failure=yes
#delayed_retry=yes
 
[Read-Write-Listener2]
type=listener
service=Read-Write-Service_rwsrvc
protocol=MariaDBClient
port=4008

Created table t1 on clustrix.

MySQL [test]> show create table t1\G
 *************************** 1. row ***************************
        Table: t1
 Create Table: CREATE TABLE `t1` (
   `col1` int(11),
   `col2` int(11)
 ) CHARACTER SET utf8 /*$ SLICES=3 */

Start maxscale:
maxscale -d -f /etc/maxscale_trxreplay.cnf --user=root
Using maxctrl set master and slave servers
[root@karma197 bin]# maxctrl set server MBS Master; maxctrl set server MBS2 Slave
Used inserter to write data in a table.
python inserter.py -n 50000000 -h karma197 -P 4008 -u maxscale -pmaxscale_pw -D test -t 50 t1
This tries to insert 50000000 rows in table t1 with 50 threads.
It inserts a few rows in the target table and fatals.

[root@karma197 bin]# maxscale -d  -f /etc/maxscale_trxreplay.cnf --user=root
Info : MaxScale will be run in the terminal process.
Configuration file : /etc/maxscale_trxreplay.cnf
Log directory      : /data/clustrix/log
Data directory     : /var/lib/maxscale
Module directory   : /usr/lib64/maxscale
Service cache      : /var/cache/maxscale
 
Fatal: MaxScale 2.4.0 received fatal signal 11. Attempting backtrace.
Commit ID: b4e8f79c5f985a184e1fb35a0aa34de34cc3e921 System name: Linux Release string: CentOS Linux release 7.6.1810 (Core)
 
 
 
Writing core dump
Segmentation fault
[root@karma197 bin]#

MaxScale logs at alert level:

2019-05-13 07:01:15   alert  : Fatal: MaxScale 2.4.0 received fatal signal 11. Attempting backtrace.
2019-05-13 07:01:15   alert  : Commit ID: b4e8f79c5f985a184e1fb35a0aa34de34cc3e921 System name: Linux Release string: CentOS Linux release 7.6.1810 (Core)
2019-05-13 07:01:15   alert  :   maxscale(_ZN7maxbase15dump_stacktraceESt8functionIFvPKcS2_EE+0x2b) [0x40ce8b]: /root/MaxScale/maxutils/maxbase/src/stacktrace.cc:130
2019-05-13 07:01:15   alert  :   maxscale(_ZN7maxbase15dump_stacktraceEPFvPKcS1_E+0x4e) [0x40d1ee]: /usr/include/c++/4.8.2/functional:2029
2019-05-13 07:01:15   alert  :   maxscale() [0x4096f9]: ??:0
2019-05-13 07:01:15   alert  :   /lib64/libpthread.so.0(+0xf5d0) [0x7fdca89ea5d0]: sigaction.c:?
2019-05-13 07:01:15   alert  :   /usr/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession10routeQueryEP5GWBUF+0x3d) [0x7fdca09acc3d]: /root/MaxScale/server/modules/routing/readwritesplit/rwsplitsession.cc:135 (discriminator 2)
2019-05-13 07:01:15   alert  :   /usr/lib64/maxscale/libreadwritesplit.so(_ZN8maxscale6RouterI7RWSplit14RWSplitSessionE10routeQueryEP10mxs_routerP18mxs_router_sessionP5GWBUF+0x1e) [0x7fdca09a6b6e]: /root/MaxScale/include/maxscale/router.hh:452
2019-05-13 07:01:15   alert  :   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0xeff60) [0x7fdca9601f60]: /root/MaxScale/server/core/session.cc:1041
2019-05-13 07:01:15   alert  :   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker4tickEv+0xf0) [0x7fdca9612620]: /root/MaxScale/maxutils/maxbase/include/maxbase/worker.hh:778
2019-05-13 07:01:15   alert  :   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase11WorkerTimer6handleEPNS_6WorkerEj+0x36) [0x7fdca9610d56]: /root/MaxScale/maxutils/maxbase/src/worker.cc:256
2019-05-13 07:01:15   alert  :   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1b5) [0x7fdca9611785]: /root/MaxScale/maxutils/maxbase/src/worker.cc:848
2019-05-13 07:01:15   alert  :   /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x51) [0x7fdca9611981]: /root/MaxScale/maxutils/maxbase/src/worker.cc:549
2019-05-13 07:01:15   alert  :   /lib64/libstdc++.so.6(+0xb5070) [0x7fdca7a1d070]: ??:?
2019-05-13 07:01:15   alert  :   /lib64/libpthread.so.0(+0x7dd5) [0x7fdca89e2dd5]: pthread_create.c:?
2019-05-13 07:01:16   alert  :   /lib64/libc.so.6(clone+0x6d) [0x7fdca69c1ead]: ??:?



 Comments   
Comment by Rahul Joshi (Inactive) [ 2019-05-14 ]

Uploaded full log file.
We have the core dump, but cannot upload as it is ~180MB.
If you need it, can you please let me know the common accessible location and I'll upload it.

Comment by Rahul Joshi (Inactive) [ 2019-05-16 ]

Core dump under:
https://drive.google.com/drive/folders/1uTW3fdRYh_JtXLW-6j2nUWAjTgCizO62?usp=sharing

Comment by markus makela [ 2019-06-24 ]

Can you try if this still happens?

Comment by Rahul Joshi (Inactive) [ 2019-06-25 ]

Sure. Will try again and update here.

Comment by Rahul Joshi (Inactive) [ 2019-06-25 ]

Looks like this is fixed now. We tried with the latest build Commit: 0b6ce3c6d8c87b15639081caf59a7e6de1c6be04 and there were no FATALs.

Comment by markus makela [ 2019-06-25 ]

Closing as fixed since it is no longer reproducible.

Generated at Thu Feb 08 04:14:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.