[MXS-4260]  Maxscale crashes frequently while performing load testing Created: 2022-08-26  Updated: 2022-10-28  Resolved: 2022-09-01

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 2.5.20, 6.2.4
Fix Version/s: 6.4.3, 22.08.1

Type: Bug Priority: Major
Reporter: Pon Suresh Pandian (Inactive) Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL7


Attachments: File maxscale.cnf_ppfmdx01     HTML File maxscale_crash_logs    
Sprint: MXS-SPRINT-165

 Description   

Hi Team,

Maxscale crash during a load test,

Crash occurs in the following order,

(1) Load start
(2) Stop slave and slave server operation
(3) start slave
(4) maxscale crash occurred

Here I have attached maxscale config file & log files for your reference please check it..

Crash logs:
------------

/usr/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession18finish_causal_readEv+0x59): server/modules/routing/readwritesplit/rwsplit_causal_reads.cc:130
  /usr/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11clientReplyEP5GWBUFRKSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0x49b): server/modules/routing/readwritesplit/rwsplitsession.cc:543
  /usr/lib64/maxscale/libreadwritesplit.so(_ZN8maxscale6RouterI7RWSplit14RWSplitSessionE11clientReplyEP10mxs_routerP18mxs_router_sessionP5GWBUFRKSt6vectorIPNS_8EndpointESaISC_EERKNS_5ReplyE+0x2a): include/maxscale/router.hh:455
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN15ServiceEndpoint11clientReplyEP5GWBUFRSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0x9f): server/core/service.cc:1944
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN14ServerEndpoint11clientReplyEP5GWBUFRSt6vectorIPN8maxscale8EndpointESaIS5_EERKNS3_5ReplyE+0xa8): server/core/server.cc:1013 (discriminator 2)
  /usr/lib64/maxscale/libmariadbclient.so.2.0.0(_ZN24MariaDBBackendConnection11normal_readEv+0x52f): server/modules/protocol/MariaDB/mariadb_backend.cc:836
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB14process_eventsEj+0x69): server/core/dcb.cc:1343
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN3DCB13event_handlerEPS_j+0x21): server/core/dcb.cc:1402
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1be): maxutils/maxbase/src/worker.cc:881
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:575
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(+0x232c6f): thread48.o:?
  /lib64/libpthread.so.0(+0x7ea5): pthread_create.c:?
  /lib64/libc.so.6(clone+0x6d): ??:?



 Comments   
Comment by markus makela [ 2022-08-29 ]

The reason why it crashes is most likely due to m_current_query being null.

        if (m_wait_gtid == RETRYING_ON_MASTER)
        {
            // Retry the query on the master
            GWBUF* buf = m_current_query.release();
            buf->hint = hint_create_route(buf->hint, HINT_ROUTE_TO_MASTER, NULL);
            retry_query(buf, 0);
            rval = false;
        }

Comment by markus makela [ 2022-08-29 ]

Might be related to routing hints causing queries to be routed to other servers.

Comment by markus makela [ 2022-08-30 ]

Turns out that having both transaction_replay and causal_reads enabled at the same time with a maxscale route to slave routing hint inside of a transaction is what caused it to happen. Hints inside transactions were supposed to be ignored when transaction replay is enabled but due to the way the query classifier decides the query type this wasn't detected early enough to prevent it.

Generated at Thu Feb 08 04:27:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.