[MXS-4005] Crash on server failure with causal_reads=local Created: 2022-02-17  Updated: 2022-03-01  Resolved: 2022-02-21

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 6.2.2
Fix Version/s: 6.2.3

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MXS-4007 Active operation count is wrong after... Closed
Relates
relates to MXS-4028 DBI driver crashes Closed
relates to MXS-4033 Upgrade to 6.2.2 confuses SQL clients Closed

 Description   

When testing a custom pre-build of the upcoming MaxScale 6.2.2 (with proposed fix for MXS-4000 and some extra logging code added), a crash in the readwritesplit happened (just once so far):

2022-02-16 15:45:35   alert  : (48) (Appdata1RW-Service); MaxScale 6.2.2 received fatal signal 11. Commit ID: 2374370f645d721e99237d4d9bc61bb691757ad6 System name: Linux Release string: undefined
2022-02-16 15:45:35   alert  : (48) (Appdata1RW-Service); Statement currently being classified: none/unknown
2022-02-16 15:45:35   alert  : (48) (Appdata1RW-Service); DCB: 0x7efe7803b370 Session: 48 Service: Appdata1RW-Service
2022-02-16 15:45:35   notice : (48) (Appdata1RW-Service); For a more detailed stacktrace, install GDB and add 'debug=gdb-stacktrace' under the [maxscale] section.
  /usr/lib64/maxscale/libreadwritesplit.so(RWSplitSession::finish_causal_read()+0x59) [0x7efe7c20b849]
  /usr/lib64/maxscale/libreadwritesplit.so(RWSplitSession::clientReply(GWBUF*, std::vector<maxscale::Endpoint*, std::allocator<maxscale::Endpoint*> > const&, maxscale::Reply const&)+0x2fc) [0x7efe7c20030c]
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(ServiceEndpoint::clientReply(GWBUF*, std::vector<maxscale::Endpoint*, std::allocator<maxscale::Endpoint*> >&, maxscale::Reply const&)+0x99) [0x7efe8574bb19]
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(ServerEndpoint::clientReply(GWBUF*, std::vector<maxscale::Endpoint*, std::allocator<maxscale::Endpoint*> >&, maxscale::Reply const&)+0xfa) [0x7efe8573d4ba]
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(MariaDBBackendConnection::normal_read()+0x19f) [0x7efe8577d30f]
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(DCB::process_events(unsigned int)+0xc4) [0x7efe856c7314]
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(DCB::event_handler(DCB*, unsigned int)+0x21) [0x7efe856c7411]
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(maxbase::Worker::poll_waitevents()+0x212) [0x7efe857bd9d2]
  /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(maxbase::Worker::run(maxbase::Semaphore*)+0x4f) [0x7efe857bdb8f]
  /usr/lib64/libstdc++.so.6(+0xdb3b4) [0x7efe845e63b4]
  /lib64/libpthread.so.0(+0xa6ea) [0x7efe833396ea]
  /lib64/libc.so.6(clone+0x3f) [0x7efe826e0a8f]
alert  : Writing core dump.



 Comments   
Comment by markus makela [ 2022-02-17 ]

The changes in that build didn't touch this code so it's highly likely this exists in 6.2.1 and 6.2.2 as well.

Comment by Hartmut Holzgraefe [ 2022-02-17 ]

Trying to find out right now whether a core file was actually written indeed, and whether we can get it ...

Comment by markus makela [ 2022-02-17 ]

6.2.2 is "smarter" with the connection capabilities in that it tries to match the ones the client uses on the backends as well. The only problem is that the causal_reads=local implementation in readwritesplit implicitly depends on multi-statements as it modifies the SQL. This caused the causal reads to fail and to be re-routed to the current master server but it doesn't explain why it would crash.

As for the technical details of the crash, the most likely reason would be that m_current_query is empty when finish_causal_read() is entered in the RETRYING_ON_MASTER state. How that happens is still unknown.

Comment by markus makela [ 2022-02-17 ]

The crash itself is caused by the result tracking problems described in MXS-4007 which eventually will cause the code to try to read from an empty buffer.

Generated at Thu Feb 08 04:25:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.