[MXS-127] disable_sescmd_history causes MaxScale to crash under load Created: 2015-05-07  Updated: 2015-06-17  Resolved: 2015-05-09

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 1.1.0
Fix Version/s: 1.1.1

Type: Bug Priority: Major
Reporter: Yves Trudeau Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

Linux Precise 64bits



 Description   

When I throttle the queries like here:

( for i in `seq 1 100000`; do echo "set @test=$i;"; sleep 0.1; done ) | mysql -h 10.3.1.110 -utpcc -ptpcc tpcc

MaxScale is stable and memory is stable. But if I remove the sleep, it quickly crashes and in the trace, I found it is always when the following message appears:

2015-05-06 10:57:25 [3] Backend 172.30.1.102:3306 already executing sescmd

Usually after just 4 or 5 statements.

From the backtrace,

2015-05-06 10:57:25 Fatal: MaxScale received fatal signal 11. Attempting backtrace.
2015-05-06 10:57:25 ./bin/maxscale() [0x54cb98]
2015-05-06 10:57:25 /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fb5b4036cb0]
2015-05-06 10:57:25 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0xa16d) [0x7fb59830116d]
2015-05-06 10:57:25 /usr/local/mariadb-maxscale/modules/libreadwritesplit.so(+0x8544) [0x7fb5982ff544]
2015-05-06 10:57:25 /usr/local/mariadb-maxscale/modules/libMySQLBackend.so(+0x4ee8) [0x7fb595ea9ee8]
2015-05-06 10:57:25 ./bin/maxscale() [0x55f874]

The issue seems to be at the beginning of sescmd_cursor_process_replies, if I am correct:

static GWBUF* sescmd_cursor_process_replies(
GWBUF* replybuf,
backend_ref_t* bref,
bool *reconnect)
{
mysql_sescmd_t* scmd;
sescmd_cursor_t* scur;
ROUTER_CLIENT_SES* ses;
ROUTER_INSTANCE* router;

scur = &bref->bref_sescmd_cur;
ss_dassert(SPINLOCK_IS_LOCKED(&(scur->scmd_cur_rses->rses_lock)));
scmd = sescmd_cursor_get_command(scur);
ses = (*scur->scmd_cur_ptr_property)->rses_prop_rsession;
router = ses->router; <----- here

Looks like something isn't protected when the properties are removed.



 Comments   
Comment by Yves Trudeau [ 2015-05-07 ]

I fixed the issue by adding a condition for active cursor in readwritesplit.c, function route_session_write like this:

--- readwritesplit.c.orig       2015-05-07 09:50:26.132423608 -0400
+++ readwritesplit.c    2015-05-07 09:51:42.159467819 -0400
@@ -4384,7 +4384,8 @@
                    if(BREF_IS_IN_USE(bref))
                    {
 
-                       if(bref->bref_sescmd_cur.position <= prop->rses_prop_data.sescmd.position)
+                       if(bref->bref_sescmd_cur.position <= prop->rses_prop_data.sescmd.position 
+                        || sescmd_cursor_is_active(&bref->bref_sescmd_cur))
                        {
                            conflict = true;
                            break;

Comment by markus makela [ 2015-05-09 ]

The chain of stored session commands was pruned too short and caused possible segfaults. This was fixed by removing only session commands that were already replied to.

Comment by Timofey Turenko [ 2015-05-12 ]

verified by adding test added https://github.com/mariadb-corporation/maxscale-system-test/blob/master/mxs127.cpp , closing

Generated at Thu Feb 08 03:56:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.