[MXS-2707] MaxScale 2.4.2 crashes randomly Created: 2019-10-01  Updated: 2019-10-07  Resolved: 2019-10-07

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 2.4.2
Fix Version/s: 2.4.3

Type: Bug Priority: Critical
Reporter: Claudio Nanni Assignee: markus makela
Resolution: Fixed Votes: 1
Labels: None

Attachments: File maxscale-242-crash.cnf     File maxscale-242-crash.log    

 Description   

The service randomly crashing and restarting itself.
This has been happening after adding new servers to the cluster but it also happens when no one is working on the cluster.
See attached for log and cnf

2019-09-30 19:10:20 alert : (774) Fatal: MaxScale 2.4.2 received fatal signal 11. Attempting backtrace.
2019-09-30 19:10:20 alert : (774) Commit ID: aad4148d77bf2dfbaa0042bc45abda30c101cad2 System name: Linux Release string: Ubuntu 16.04.6 LTS
-------------------
2019-10-01 09:08:16 alert : (805) Fatal: MaxScale 2.4.2 received fatal signal 11. Attempting backtrace.
2019-10-01 09:08:16 alert : (805) Commit ID: aad4148d77bf2dfbaa0042bc45abda30c101cad2 System name: Linux Release string: Ubuntu 16.04.6 LTS
2019-10-01 09:08:16 notice : (805) Stmt 5(2019-10-01 08:58:16): SELECT _mod.id, _mod.name, _mod.org_id, _mod.supp_id, _mod.value, _mod.active FROM int_module _mod WHERE _mod.org_id = 579
2019-10-01 09:08:16 notice : (805) Stmt 4(2019-10-01 08:58:17): SELECT _p.id, _p.author, _p.org_id, _p.email, _p.firstname, _p.lastname, _p.lastname_prefix, _p.internetname, _p.userpin, _p.socialnumber, _p.ins
, _p.greetingtime, _p.role, _p.status, _p.user_id, _p.flags, _p.reason, _p.transfer_status, _p.comp_contact, _p.active_until, _p.last_login, _p.hsaid, _p.login_retries, _p.ext_ref FROM int_people _p WHERE _p.org
_id = 579 AND _p.id = 75809
2019-10-01 09:08:16 notice : (805) Stmt 3(2019-10-01 08:58:17): SELECT _m.id, _m.org_id, _m.supp_id, _m.rpt_id, _m.filename, _m.message_type, _m.msg_text_id, _m.valid_from, _m.valid_to, _m.ins, _m.author, _m.m
odified, _m.modified_by, _m.status FROM int_message _m WHERE _m.org_id = 579 AND _m.id = 47955
2019-10-01 09:08:16 notice : (805) Stmt 2(2019-10-01 08:58:17): SELECT _sg.id, _sg.org_id, _sg.ppl_id, _sg.supp_id, _sg.cli_org_id, _sg.workteam_id, _sg.legislation, _sg.mis_cat_org_id, _sg.starttime, _sg.stop
time, _sg.stoptime_original, _sg.active, _sg.note, _sg.author, _sg.ext_ref, _sg.ext_ref_cat, _sg.flags, _sg.transfer_status, _sg.ins, _sg.ext_ref_responsible FROM int_service_group _sg WHERE _sg.org_id = 579 AND
_sg.id = 1160
2019-10-01 09:08:16 notice : (805) Stmt 1(2019-10-01 08:58:17): SELECT _cs.id, _cs.org_id, _cs.supp_id, _cs.type, _cs.flags, _cs.name, _cs.description, _cs.issued_by, _cs.expire_date, _cs.system_id, _cs.cn, _c
s.data, _cs.author, _cs.ins FROM int_certificate_store _cs WHERE _cs.org_id = 579 AND _cs.id = 7
sh: 1: /bin/test: not found
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale7Session9QueryInfo20book_server_responseEP6SERVERb+0x56) [0x7f7d452e13e6]: <binutils not installe
d>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale7Session20book_server_responseEP6SERVERb+0xd8) [0x7f7d452e1528]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11clientReplyEP5GWBUFP3DCB+0x909) [0x7f7d3b37d2a9]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_ZN8maxscale6RouterI7RWSplit14RWSplitSessionE11clientReplyEP10mxs_routerP18mxs_router_sessionP5GWBUFP3DCB+0x27) [0x7
f7d3b374a37]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libmariadbbackend.so(+0x4900) [0x7f7d3c1d6900]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x9bf6c) [0x7f7d4527ff6c]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x9c021) [0x7f7d45280021]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1ab) [0x7f7d452efd3b]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53) [0x7f7d452eff63]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f7d4437ac80]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f7d44d656ba]: <binutils not installed>
2019-10-01 09:08:16 alert : (805) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7d437b041d]: <binutils not installed>



 Comments   
Comment by markus makela [ 2019-10-02 ]

Based on the stacktrace, disabling retain_last_statements might prevent the crash. It's also possible that due to malloc corruption it just happened to crash there.

Comment by markus makela [ 2019-10-03 ]

Crash happens in session.cc:1653 where a new ServerInfo is pushed into a vector.

Comment by markus makela [ 2019-10-03 ]

Managed to emulate the crash by forcibly setting m_current_query to a value that moves the used QueryInfo past the end of the container:

2019-10-03 20:52:38   alert  : (1)   /lib64/libpthread.so.0(+0x12c60): sigaction.c:?
2019-10-03 20:52:38   alert  : (1)   /lib64/libasan.so.5(+0x28c7f): ??:?
nm: /lib64/libasan.so.5: no symbols
2019-10-03 20:52:38   alert  : (1)   /lib64/libasan.so.5(_ZdlPv+0x89): ??:0
2019-10-03 20:52:38   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN9__gnu_cxx13new_allocatorIN8maxscale7Session9QueryInfo10ServerInfoEE10deallocateEPS4_m+0x20): /usr/include/c++/9/ext/new_allocator.h:129
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt16allocator_traitsISaIN8maxscale7Session9QueryInfo10ServerInfoEEE10deallocateERS4_PS3_m+0x2b): /usr/include/c++/9/bits/alloc_traits.h:470
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt12_Vector_baseIN8maxscale7Session9QueryInfo10ServerInfoESaIS3_EE13_M_deallocateEPS3_m+0x32): /usr/include/c++/9/bits/stl_vector.h:352
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6vectorIN8maxscale7Session9QueryInfo10ServerInfoESaIS3_EE17_M_realloc_insertIJS3_EEEvN9__gnu_cxx17__normal_iteratorIPS3_S5_EEDpOT_+0x463): /usr/include/c++/9/bits/vector.tcc:502
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6vectorIN8maxscale7Session9QueryInfo10ServerInfoESaIS3_EE12emplace_backIJS3_EEEvDpOT_+0x12d): /usr/include/c++/9/bits/vector.tcc:125
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6vectorIN8maxscale7Session9QueryInfo10ServerInfoESaIS3_EE9push_backEOS3_+0x2e): /usr/include/c++/9/bits/stl_vector.h:1201
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale7Session9QueryInfo20book_server_responseEP6SERVERb+0x47b): server/core/session.cc:1653
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale7Session20book_server_responseEP6SERVERb+0x5bc): server/core/session.cc:1511
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_Z28session_book_server_responseP11MXS_SESSIONP6SERVERb+0x2c): server/core/session.cc:1007
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11clientReplyEP5GWBUFP3DCB+0x46c): server/modules/routing/readwritesplit/rwsplitsession.cc:700
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libreadwritesplit.so(_ZN8maxscale6RouterI7RWSplit14RWSplitSessionE11clientReplyEP10mxs_routerP18mxs_router_sessionP5GWBUFP3DCB+0x4a): include/maxscale/router.hh:477
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmariadbbackend.so(+0xbfc8): server/modules/protocol/MySQL/mariadbbackend/mysql_backend.cc:1048
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmariadbbackend.so(+0x9646): server/modules/protocol/MySQL/mariadbbackend/mysql_backend.cc:513
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x2d8405): server/core/dcb.cc:2705
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x2d8bb0): server/core/dcb.cc:2786
2019-10-03 20:52:39   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x2d8cf8): server/core/dcb.cc:2822
2019-10-03 20:52:40   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0xa14): maxutils/maxbase/src/worker.cc:856
2019-10-03 20:52:40   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x20d): maxutils/maxbase/src/worker.cc:557
2019-10-03 20:52:40   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker11thread_mainEPS0_PNS_9SemaphoreE+0x23): maxutils/maxbase/src/worker.cc:682
2019-10-03 20:52:40   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZSt13__invoke_implIvPFvPN7maxbase6WorkerEPNS0_9SemaphoreEEJS2_S4_EET_St14__invoke_otherOT0_DpOT1_+0xad): /usr/include/c++/9/bits/invoke.h:60
2019-10-03 20:52:40   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZSt8__invokeIPFvPN7maxbase6WorkerEPNS0_9SemaphoreEEJS2_S4_EENSt15__invoke_resultIT_JDpT0_EE4typeEOS8_DpOS9_+0xbc): /usr/include/c++/9/bits/invoke.h:96
2019-10-03 20:52:40   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6thread8_InvokerISt5tupleIJPFvPN7maxbase6WorkerEPNS2_9SemaphoreEES4_S6_EEE9_M_invokeIJLm0ELm1ELm2EEEEvSt12_Index_tupleIJXspT_EEE+0x5f): /usr/include/c++/9/thread:244
2019-10-03 20:52:40   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6thread8_InvokerISt5tupleIJPFvPN7maxbase6WorkerEPNS2_9SemaphoreEES4_S6_EEEclEv+0x7e): /usr/include/c++/9/thread:247
2019-10-03 20:52:40   alert  : (1)   /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvPN7maxbase6WorkerEPNS3_9SemaphoreEES5_S7_EEEEE6_M_runEv+0x1c): /usr/include/c++/9/thread:195
2019-10-03 20:52:40   alert  : (1)   /lib64/libstdc++.so.6(+0xd76f4): ??:?
2019-10-03 20:52:40   alert  : (1)   /lib64/libpthread.so.0(+0x84c0): pthread_create.c:?
2019-10-03 20:52:40   alert  : (1)   /lib64/libc.so.6(clone+0x43): :?

This suggests that the array index bookkeeping is not correct in all cases.

Comment by markus makela [ 2019-10-03 ]

Looks like COM_CHANGE_USER, COM_STMT_SEND_LONG_DATA and LOAD DATA LOCAL INFILE both cause the array index to be incremented but not decremented. This will eventually lead to integer overflow which creates an invalid iterator when the offset into the last queries array is calculated:

        if (m_current_query < static_cast<int>(m_last_queries.size()))
        {
            auto i = m_last_queries.begin() + m_current_query;
            QueryInfo& info = *i;
 
            mxb_assert(!info.complete());
 
            info.book_server_response(pServer, final_response);
        }

This is quite unlikely to happen even when pooled connections and/or LOAD DATA LOCAL INFILE are used but it is not impossible. So far it's the only theory that explains why MaxScale crashed where it did.

Generated at Thu Feb 08 04:16:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.