[MXS-3220] MaxScale crashes in gwbuf_set_type() upon query retry Created: 2020-10-02  Updated: 2021-04-19  Resolved: 2020-10-14

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 2.4.10, 2.4.12
Fix Version/s: 2.4.13

Type: Bug Priority: Major
Reporter: Valerii Kravchuk Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Sprint: MXS-SPRINT-117

 Description   

Crash with the following backtrace happens:

2020-10-02 11:00:36   error  : (1) Write to Backend DCB X.Y.Z.T in state DCB_STATE_POLLING failed: 104, Connection reset by peer
2020-10-02 11:00:36   alert  : (1) Fatal: MaxScale 2.4.10 received fatal signal 11. Commit ID: 7781f7042ab077811e2431794c2280162c0a6a3d System name: Linux Release string: Red Hat Enterprise Linux Server release 7.8 (Maipo)
2020-10-02 11:00:36   alert  : (1) 
  /usr/local/maxscale/bin/../lib64/maxscale/libmaxscale-common.so.1.0.0(_Z14gwbuf_set_typeP5GWBUFj+0x10): server/core/buffer.cc:600
  /usr/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11retry_queryEP5GWBUFi+0x33): server/modules/routing/readwritesplit/rwsplit_route_stmt.cc:159
  /usr/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession18retry_master_queryEPN8maxscale9RWBackendE+0x14b): /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/stl_deque.h:1582
  /usr/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11handleErrorEP5GWBUFP3DCB12error_actionPb+0x692): server/modules/routing/readwritesplit/rwsplitsession.cc:1128
  /usr/lib64/maxscale/libreadwritesplit.so(_ZN8maxscale6RouterI7RWSplit14RWSplitSessionE11handleErrorEP10mxs_routerP18mxs_router_sessionP5GWBUFP3DCB12error_actionPb+0x2d): include/maxscale/router.hh:489
  /usr/lib64/maxscale/libmariadbbackend.so(+0x4f44): server/modules/protocol/MySQL/mariadbbackend/mysql_backend.cc:626
  /usr/lib64/maxscale/libmariadbbackend.so(+0x5101): server/modules/protocol/MySQL/mariadbbackend/mysql_backend.cc:1386
  /usr/local/maxscale/bin/../lib64/maxscale/libmaxscale-common.so.1.0.0(+0xa0057): server/core/dcb.cc:2720
  /usr/local/maxscale/bin/../lib64/maxscale/libmaxscale-common.so.1.0.0(+0xa0271): server/core/dcb.cc:2755
  /usr/local/maxscale/bin/../lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1a6): maxutils/maxbase/src/worker.cc:858
  /usr/local/maxscale/bin/../lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:559
  /usr/local/maxscale/bin/../lib64/maxscale/libmaxscale-common.so.1.0.0(+0x1b830f): thread48.o:?
  /lib64/libpthread.so.0(+0x7ea5): pthread_create.c:?
  /lib64/libc.so.6(clone+0x6d): ??:?



 Comments   
Comment by markus makela [ 2020-10-13 ]

Modifying the code to pass a bad pointer to retry_query in retry_master_query seems to reproduce the stacktrace:

2020-10-13 10:28:00   alert  : (1) Fatal: MaxScale 2.4.10 received fatal signal 11. Commit ID: 7781f7042ab077811e2431794c2280162c0a6a3d System name: Linux Release string: Fedora release 32 (Thirty Two)
2020-10-13 10:28:00   alert  : (1) 
  /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_Z14gwbuf_set_typeP5GWBUFj+0x8): server/core/buffer.cc:600
  /home/markusjm/build-develop/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11retry_queryEP5GWBUFi+0x33): server/modules/routing/readwritesplit/rwsplit_route_stmt.cc:159
  /home/markusjm/build-develop/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession18retry_master_queryEPN8maxscale9RWBackendE+0x19f): /usr/include/c++/10/bits/stl_deque.h:1533
  /home/markusjm/build-develop/lib64/maxscale/libreadwritesplit.so(_ZN14RWSplitSession11handleErrorEP5GWBUFP3DCB12error_actionPb+0x6fc): server/modules/routing/readwritesplit/rwsplitsession.cc:1135
  /home/markusjm/build-develop/lib64/maxscale/libreadwritesplit.so(_ZN8maxscale6RouterI7RWSplit14RWSplitSessionE11handleErrorEP10mxs_routerP18mxs_router_sessionP5GWBUFP3DCB12error_actionPb+0x29): include/maxscale/router.hh:489
  /home/markusjm/build-develop/lib64/maxscale/libmariadbbackend.so(+0x57a3): server/modules/protocol/MySQL/mariadbbackend/mysql_backend.cc:626
  /home/markusjm/build-develop/lib64/maxscale/libmariadbbackend.so(+0x5981): server/modules/protocol/MySQL/mariadbbackend/mysql_backend.cc:1386
  /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(+0xa21c5): server/core/dcb.cc:2737
  /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(+0xa2421): server/core/dcb.cc:2762
  /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker15poll_waiteventsEv+0x1a6): maxutils/maxbase/src/worker.cc:858
  /home/markusjm/build-develop/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN7maxbase6Worker3runEPNS_9SemaphoreE+0x53): maxutils/maxbase/src/worker.cc:559
  /lib64/libstdc++.so.6(+0xd8b84): ??:?
  /lib64/libpthread.so.0(+0x9432): pthread_create.c:?
  /lib64/libc.so.6(clone+0x43): :?

The simplest explanation is that the query queue is empty when it is expected to contain at least one query. This means that the debug assertion mxb_assert(!m_query_queue.empty()) would trigger when run with a debug build.

Comment by markus makela [ 2020-10-13 ]

Managed to also reproduce it without modifying code. By interrupting the history replay that happens when a master reconnection is done due, I managed to trigger a crash. This only happens when the command that causes the reconnection is a session command.

Generated at Thu Feb 08 04:19:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.