[MXS-2324] Maxscale disconnect after commit without begin from Icinga2 Created: 2019-02-08  Updated: 2020-08-25  Resolved: 2019-04-12

Status: Closed
Project: MariaDB MaxScale
Component/s: Core, readwritesplit
Affects Version/s: 2.3
Fix Version/s: 2.3.6

Type: Bug Priority: Major
Reporter: Richard Stracke Assignee: markus makela
Resolution: Fixed Votes: 1
Labels: None

Sprint: MXS-SPRINT-77, MXS-SPRINT-78, MXS-SPRINT-79

 Description   

Icinga2 sometimes send a commit without a bagin.

This is caused from this codepart of icinga2

void IdoMysqlConnection::InternalNewTransaction()
{
	AssertOnWorkQueue();
 
	if (!GetConnected())
		return;
 
	AsyncQuery("COMMIT");
	AsyncQuery("BEGIN");
}

2019-02-06 10:37:42 info : (26) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 17, type: QUERY_TYPE_COMMIT, stmt: COMMIT;BEGIN
2019-02-06 10:37:42 info : (26) Route query to master: -server [192.168.53.12]:3306 <
2019-02-06 10:37:42 info : (26) Reply complete, last reply from server
2019-02-06 10:37:42 info : (26) Non-space data found after semi-colon: 'BEGIN'.
2019-02-06 10:37:42 info : (26) Multi-statement query or stored procedure call, routing all future queries to master.

Leads to crash

2019-02-06 10:37:46 info : (26) Non-space data found after semi-colon: 'BEGIN'.
2019-02-06 10:37:46 info : (26) Storing query (len: 17 cmd: 3), expecting 1 replies to current command
2019-02-06 10:40:11 info : (7) Started maxadmin-service client session [30] for 'root' from localhost
2019-02-06 10:40:11 info : (30) MaxAdmin: list servers
2019-02-06 10:40:11 info : Stopped maxadmin-service client session [30]
2019-02-06 10:40:15 info : (7) Started maxadmin-service client session [31] for 'root' from localhost
2019-02-06 10:40:15 info : (31) MaxAdmin: show monitors
2019-02-06 10:40:15 info : Stopped maxadmin-service client session [31]
2019-02-06 10:40:23 info : (7) Started maxadmin-service client session [32] for 'root' from localhost
2019-02-06 10:40:23 info : (32) MaxAdmin: list services
2019-02-06 10:40:23 info : Stopped maxadmin-service client session [32]
[...]
2019-02-06 10:41:06 info : Stopped maxadmin-service client session [40]
2019-02-06 10:41:16 info : Worker 0x56243959e8b0 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a4bd0 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x56243959f730 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a14d0 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a6e30 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a1bb0 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a6070 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a14d0 has shut down.
2019-02-06 10:41:16 info : Worker 0x5624395a44f0 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a3730 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a5990 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a0120 received shutdown message.
2019-02-06 10:41:16 info : Worker 0x5624395a0df0 received shutdown message.
[....]
 
019-02-06 10:41:16 notice : Housekeeper shutting down.
2019-02-06 10:41:16 notice : Housekeeper has shut down.
2019-02-06 10:41:16 info : Waiting for worker 0x56243959e8b0.
2019-02-06 10:41:16 info : Waited for worker 0x56243959e8b0.
2019-02-06 10:41:16 info : Waiting for worker 0x56243959f730.
2019-02-06 10:41:16 info : Waited for worker 0x56243959f730.
2019-02-06 10:41:16 info : Waiting for worker 0x56243959fb70.
2019-02-06 10:41:16 info : Waited for worker 0x56243959fb70.
2019-02-06 10:41:16 info : Waiting for worker 0x5624395a0120.
2019-02-06 10:41:16 info : Waited for worker 0x5624395a0120.
2019-02-06 10:41:16 info : Waiting for worker 0x5624395a0780.
2019-02-06 10:41:16 info : Waited for worker 0x5624395a0780.
2019-02-06 10:41:16 info : Waiting for worker 0x5624395a0df0.
[....]
2019-02-06 10:41:16 notice : All workers have shut down.
2019-02-06 10:41:16 info : Stopped readwrite-splitter-service client session [1]
2019-02-06 10:41:16 info : Stopped readwrite-splitter-service client session [2]
2019-02-06 10:41:16 info : Stopped readwrite-splitter-service client session [3]
2019-02-06 10:41:16 info : Stopped readwrite-splitter-service client session [4]
2019-02-06 10:41:16 info : Stopped readwrite-splitter-service client session [5]
2019-02-06 10:41:16 info : Stopped readwrite-splitter-service client session [6]
2019-02-06 10:41:16 info : Stopped maxadmin-service client session [7]
2019-02-06 10:41:16 notice : MaxScale received signal SIGTERM. Exiting.
2019-02-06 10:41:16 notice : MaxScale is shutting down.
2019-02-06 10:41:16 notice : MaxScale shutdown completed.
2019-02-06 10:41:16 MariaDB MaxScale is shut down.

Sometimes it can reproduced with

mysqlslap --user=maxscale --password=xxxxx --host=127.0.0.1 --port=4006 --concurrency=50 --iterations=1000 --create-schema=test --query="commit;insert into test.t1 (a) values('test');select * from test.t1 limit 2;begin;insert into test.t1 (a) values('test');commit;begin;"

Works fine with Maxscale 2.2



 Comments   
Comment by markus makela [ 2019-02-08 ]

That looks like a normal controlled shutdown instead of a crash. Does the log contain a stack trace or any other signs of fatal errors?

I think the transaction tracking might not process that statement correctly and it needs to be modified to detect only the last statement that affects the transaction state.

Comment by Richard Stracke [ 2019-02-15 ]

It is not cause a crash, it cause MySQL connection hang indefinitely from client perspective.

Comment by Johan Wikman [ 2019-03-05 ]

I am not sure that mysqlslap command reproduces the actual problem.

When I run it, I get

mysqlslap: Error when connecting to server: 2002 Can't connect to MySQL server on '127.0.0.1' (99)
mysqlslap: Error when connecting to server: 2002 Can't connect to MySQL server on '127.0.0.1' (99)
...

However, the reason for that is likely to be that the maximum size of the backlog queue of listen is reached. I get the same result if I replace the query with a simple "select 1" . The behavior is also the same for 2.2.

The info logging "2019-02-06 10:37:42 info : (26) Non-space data found after semi-colon: 'BEGIN'." is probably irrelevant. That's logged by the code looking for transaction boundaries. The code can't handle multi-statements and complains when it sees that "COMMIT; BEGIN;", but that should not matter since the first multi-statement will cause everything in that session to be sent to the master anyway.

Increasing the size of the backlog(/proc/sys/net/ipv4/tcp_max_syn_backlog) should in principle have an effect, but although I increased the value significantly, from 512 to 512000, the behaviour stayed the same.

However, there is something peculiar going on. When MaxScale is started, list services tells that #users is 1 (that is also peculiar as it should be 0). Then when mysqlslap is started, #users jumps to 52, as expected (well, 51 would be even more expected). The peculiar thing is that when mysqlslap starts complaining that it cannot connect, #users drops to 2 and stays there. After a while, mysqlslap may stop complaining and #users increases to 52 again, and drops to 2 after a while when mysqlslap starts complaining anew. That may repeat a number of times before mysqlslap ends its run.

When mysqlslap complains and #users is 2 it is also not possible to connect using mysql, as expected. For the time being I don't know what MaxScale is doing during the significant amount of the time #users is 2 and it is not possible to connect.

Comment by Johan Wikman [ 2019-03-05 ]

Richard Now I realize that I just assumed that you also saw that error, but if you experienced a hang instead of that error, then I've been barking up the wrong tree.

Comment by markus makela [ 2019-04-11 ]

Looks like it could be a problem in the result processing code in readwritesplit.

Comment by markus makela [ 2019-04-12 ]

The problem appeared to be in the response processing code.

Generated at Thu Feb 08 04:13:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.