[MXS-2324] Maxscale disconnect after commit without begin from Icinga2 Created: 2019-02-08 Updated: 2020-08-25 Resolved: 2019-04-12 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core, readwritesplit |
| Affects Version/s: | 2.3 |
| Fix Version/s: | 2.3.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Richard Stracke | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Sprint: | MXS-SPRINT-77, MXS-SPRINT-78, MXS-SPRINT-79 |
| Description |
|
Icinga2 sometimes send a commit without a bagin. This is caused from this codepart of icinga2
Leads to crash
Sometimes it can reproduced with
Works fine with Maxscale 2.2 |
| Comments |
| Comment by markus makela [ 2019-02-08 ] | |||
|
That looks like a normal controlled shutdown instead of a crash. Does the log contain a stack trace or any other signs of fatal errors? I think the transaction tracking might not process that statement correctly and it needs to be modified to detect only the last statement that affects the transaction state. | |||
| Comment by Richard Stracke [ 2019-02-15 ] | |||
|
It is not cause a crash, it cause MySQL connection hang indefinitely from client perspective. | |||
| Comment by Johan Wikman [ 2019-03-05 ] | |||
|
I am not sure that mysqlslap command reproduces the actual problem. When I run it, I get
However, the reason for that is likely to be that the maximum size of the backlog queue of listen is reached. I get the same result if I replace the query with a simple "select 1" . The behavior is also the same for 2.2. The info logging "2019-02-06 10:37:42 info : (26) Non-space data found after semi-colon: 'BEGIN'." is probably irrelevant. That's logged by the code looking for transaction boundaries. The code can't handle multi-statements and complains when it sees that "COMMIT; BEGIN;", but that should not matter since the first multi-statement will cause everything in that session to be sent to the master anyway. Increasing the size of the backlog(/proc/sys/net/ipv4/tcp_max_syn_backlog) should in principle have an effect, but although I increased the value significantly, from 512 to 512000, the behaviour stayed the same. However, there is something peculiar going on. When MaxScale is started, list services tells that #users is 1 (that is also peculiar as it should be 0). Then when mysqlslap is started, #users jumps to 52, as expected (well, 51 would be even more expected). The peculiar thing is that when mysqlslap starts complaining that it cannot connect, #users drops to 2 and stays there. After a while, mysqlslap may stop complaining and #users increases to 52 again, and drops to 2 after a while when mysqlslap starts complaining anew. That may repeat a number of times before mysqlslap ends its run. When mysqlslap complains and #users is 2 it is also not possible to connect using mysql, as expected. For the time being I don't know what MaxScale is doing during the significant amount of the time #users is 2 and it is not possible to connect. | |||
| Comment by Johan Wikman [ 2019-03-05 ] | |||
|
Richard Now I realize that I just assumed that you also saw that error, but if you experienced a hang instead of that error, then I've been barking up the wrong tree. | |||
| Comment by markus makela [ 2019-04-11 ] | |||
|
Looks like it could be a problem in the result processing code in readwritesplit. | |||
| Comment by markus makela [ 2019-04-12 ] | |||
|
The problem appeared to be in the response processing code. |