[MXS-3270] MaxScale 2.5.5 crashes with signal 11 Created: 2020-10-30  Updated: 2020-11-04  Resolved: 2020-11-04

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 2.5.5
Fix Version/s: 2.5.6

Type: Bug Priority: Major
Reporter: Yury Kirsanov Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: crash, galera
Environment:

Ubuntu 18.04.5 LTS, MariaDB Galera cluster, 3 nodes, RWSplit.



 Description   

Hi,
I've just upgraded MaxScale from 2.4.12, which is working just fine and we're using it for a long time to 2.5.5. After updating configuration to new version MaxScale launched just fine, but as soon as I'm trying to use it it fails with signal11:

2020-10-30 14:53:44 alert : (2) (RW-Test) MaxScale 2.5.5 received fatal signal 11. Commit ID: 91c3b76195d0057ddbe572bbb8d17f6ac6b09d5e System name: Linux Release string: Ubuntu 18.04.5 LTS
2020-10-30 14:53:44 alert : (2) (RW-Test) Statement currently being classified: none/unknown
2020-10-30 14:53:44 notice : (2) (RW-Test) Stmt 5(1970-01-01 10:00:00): START TRANSACTION
2020-10-30 14:53:44 notice : (2) (RW-Test) Stmt 4(1970-01-01 10:00:00): SELECT XXXXXX
2020-10-30 14:53:44 notice : (2) (RW-Test) Stmt 3(1970-01-01 10:00:00): SELECT XXXXXX
2020-10-30 14:53:44 notice : (2) (RW-Test) Stmt 2(1970-01-01 10:00:00): COMMIT
2020-10-30 14:53:44 notice : (2) (RW-Test) Stmt 1(1970-01-01 10:00:00): START TRANSACTION
nm: /lib/x86_64-linux-gnu/libc.so.6: no symbols
2020-10-30 14:53:46 alert : (2) (RW-Test)
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7Session9QueryInfo20book_server_responseEP6SERVERb+0x33): /usr/include/c++/7/bits/vector.tcc:98
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7Session20book_server_responseEP6SERVERb+0xfe): server/core/session.cc:1109
/usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11clientReplyEP5GWBUFRKSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0x196): server/modules/routing/readwritesplit/rwsplitsession.cc:606
/usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_ZN8maxscale6RouterI7RWSplit14RWSplitSessionE11clientReplyEP10mxs_routerP18mxs_router_sessionP5GWBUFRKSt6vectorIPNS_8EndpointESaISC_EERKNS_5ReplyE+0x2a): include/maxscale/router.hh:455
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN15ServiceEndpoint11clientReplyEP5GWBUFRSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0xae): server/core/service.cc:1848
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN14ServerEndpoint11clientReplyEP5GWBUFRSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0xbb): server/core/server.cc:981 (discriminator 2)
/usr/lib/x86_64-linux-gnu/maxscale/libmariadbclient.so(_ZN24MariaDBBackendConnection11normal_readEv+0x51d): server/modules/protocol/MariaDB/mariadb_backend.cc:828
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB14process_eventsEj+0x6c): server/core/dcb.cc:1291
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB13event_handlerEPS_j+0x21): server/core/dcb.cc:1350
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1ce): maxutils/maxbase/src/worker.cc:879
/usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:574
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd6df): ??:?
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db): ??:?
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f): ??:0

Configuration:

[maxscale]
threads=auto
retain_last_statements=5
dump_last_statements=on_error
admin_secure_gui=false
admin_host=10.22.20.1
query_retries=3
 
[GaleraMonitorTest]
type=monitor
module=galeramon
servers=node1-test,node2-test,node3-test
user=root
password=root
monitor_interval=1000
disable_master_failback=true
available_when_donor=true
 
[RW-Test]
type=service
router=readwritesplit
servers=node1-test,node2-test,node3-test
user=root
password=root
max_slave_connections=100%
connection_keepalive=30
master_failure_mode=fail_on_write
master_accept_reads=true
disable_sescmd_history=true
enable_root_user=true
 
[RWlistener-Test]
type=listener
service=RW-Test
protocol=mariadbclient
address=10.22.20.1
port=3308
 
[node1-test]
type=server
address=10.22.23.201
port=3306
protocol=mariadbbackend
priority=3
persistmaxtime=3600s
persistpoolmax=100
 
[node2-test]
type=server
address=10.22.23.202
port=3306
protocol=mariadbbackend
priority=2
persistmaxtime=3600s
persistpoolmax=100
 
[node3-test]
type=server
address=10.22.23.203
port=3306
protocol=mariadbbackend
priority=1
persistmaxtime=3600s
persistpoolmax=100



 Comments   
Comment by Yury Kirsanov [ 2020-10-30 ]

Also I have just tried updating to Ubuntu 20.04.1 LTS and got same results:

2020-10-30 15:26:13   alert  : (12) (RW-Test) MaxScale 2.5.5 received fatal signal 11. Commit ID: 91c3b76195d0057ddbe572bbb8d17f6ac6b09d5e System name: Linux Release string: Ubuntu 20.04.1 LTS
2020-10-30 15:26:13   alert  : (12) (RW-Test) Statement currently being classified: none/unknown
2020-10-30 15:26:13   notice : (12) (RW-Test) Stmt 5(1970-01-01 10:00:00): START TRANSACTION
2020-10-30 15:26:13   notice : (12) (RW-Test) Stmt 4(1970-01-01 10:00:00): SELECT XXXXX
2020-10-30 15:26:13   notice : (12) (RW-Test) Stmt 3(1970-01-01 10:00:00): SELECT XXXXX
2020-10-30 15:26:13   notice : (12) (RW-Test) Stmt 2(1970-01-01 10:00:00): COMMIT
2020-10-30 15:26:13   notice : (12) (RW-Test) Stmt 1(1970-01-01 10:00:00): START TRANSACTION
2020-10-30 15:26:13   alert  : (9) (RW-Test) MaxScale 2.5.5 received fatal signal 11. Commit ID: 91c3b76195d0057ddbe572bbb8d17f6ac6b09d5e System name: Linux Release string: Ubuntu 20.04.1 LTS
2020-10-30 15:26:13   alert  : (9) (RW-Test) Statement currently being classified: none/unknown
2020-10-30 15:26:13   notice : (9) (RW-Test) Stmt 5(1970-01-01 10:00:00): SELECT XXXXX
2020-10-30 15:26:13   notice : (9) (RW-Test) Stmt 4(1970-01-01 10:00:00): SELECT XXXXX
2020-10-30 15:26:13   notice : (9) (RW-Test) Stmt 3(1970-01-01 10:00:00): SELECT XXXXX
2020-10-30 15:26:13   notice : (9) (RW-Test) Stmt 2(1970-01-01 10:00:00): START TRANSACTION
2020-10-30 15:26:13   notice : (9) (RW-Test) Stmt 1(1970-01-01 10:00:00): SELECT XXXXX
nm: /lib/x86_64-linux-gnu/libc.so.6: no symbols
2020-10-30 15:26:14   alert  : (9) (RW-Test)
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7Session9QueryInfo20book_server_responseEP6SERVERb+0x33): /usr/include/c++/7/bits/vector.tcc:98
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7Session20book_server_responseEP6SERVERb+0xfe): server/core/session.cc:1109
  /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11clientReplyEP5GWBUFRKSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0x196): server/modules/routing/readwritesplit/rwsplitsession.cc:606
  /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_ZN8maxscale6RouterI7RWSplit14RWSplitSessionE11clientReplyEP10mxs_routerP18mxs_router_sessionP5GWBUFRKSt6vectorIPNS_8EndpointESaISC_EERKNS_5ReplyE+0x2a): include/maxscale/router.hh:455
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN15ServiceEndpoint11clientReplyEP5GWBUFRSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0xae): server/core/service.cc:1848
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN14ServerEndpoint11clientReplyEP5GWBUFRSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0xbb): server/core/server.cc:981 (discriminator 2)
  /usr/lib/x86_64-linux-gnu/maxscale/libmariadbclient.so(_ZN24MariaDBBackendConnection11normal_readEv+0x51d): server/modules/protocol/MariaDB/mariadb_backend.cc:828
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB14process_eventsEj+0x6c): server/core/dcb.cc:1291
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB13event_handlerEPS_j+0x21): server/core/dcb.cc:1350
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1ce): maxutils/maxbase/src/worker.cc:879
  /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:574
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6d84): ??:?
  /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609): ??:?
  /lib/x86_64-linux-gnu/libc.so.6(clone+0x43): ??:0
nm: /lib/x86_64-linux-gnu/libc.so.6: no symbols

Comment by markus makela [ 2020-10-30 ]

Does the crash still happen if you remove retain_last_statements=5 and dump_last_statements=on_error?

Comment by markus makela [ 2020-10-30 ]

Could you also try to reproduce this with log_info enabled?

Comment by Yury Kirsanov [ 2020-10-30 ]

Yes, after removing two options you mentioned MaxScale is not generating signal 11, but often fails like this:

2020-10-30 18:37:17   error  : (15) Invalid authentication message from backend 'node1-test'. Error code: 1047, Msg : #08S01: Unknown command
2020-10-30 18:37:17   error  : (15) [readwritesplit] (RW-Test) Lost connection to the master server, closing session. Lost connection to master server while waiting for a result. Connection has been idle for 1 seconds. Error caused by: #HY000:  (Generated event) backend server: connection closed by peer. Last close reason: <none>. Last error:
2020-10-30 18:37:17   error  : (16) Invalid authentication message from backend 'node3-test'. Error code: 1047, Msg : #08S01: Unknown command
2020-10-30 18:37:17   error  : (16) Invalid authentication message from backend 'node2-test'. Error code: 1047, Msg : #08S01: Unknown command
2020-10-30 18:37:17   error  : (16) Invalid authentication message from backend 'node1-test'. Error code: 1047, Msg : #08S01: Unknown command
2020-10-30 18:37:17   error  : (16) [readwritesplit] (RW-Test) Lost connection to the master server, closing session. Lost connection to master server while waiting for a result. Connection has been idle for 1 seconds. Error caused by: #HY000:  (Generated event) backend server: connection closed by peer. Last close reason: <none>. Last error:

Comment by markus makela [ 2020-10-30 ]

Those seem to be related to something else. Is there somethin specific that triggers that behavior?

As a workaround for the problem, I'd recommend leaving out those parameters until we get this fixed.

Comment by Yury Kirsanov [ 2020-10-30 ]

Thanks, I'll leave these options disabled. In regards to the issue with authentication - if I switch back to MaxScale 2.4.12 without changing any parameters - it works perfectly fine. I'm loading same web page of our application that connects to the database via MaxScale.

Comment by Yury Kirsanov [ 2020-10-30 ]

Yep, just tested again and can confirm - 2.4.12 works fine, 2.5.5 immediately throws the abovementioned errors and web page of our application is not opening.

Comment by markus makela [ 2020-10-30 ]

I believe I have managed to reproduce this and create a fix for the bug. This seems to have been caused by some of the changes to the routing logic in readwritesplit which inadvertently caused the response to be booked multiple times.

Comment by Yury Kirsanov [ 2020-10-30 ]

Awesome news, thanks! I will be waiting for new updates and happy to test patch in our environment!

Generated at Thu Feb 08 04:20:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.