[MXS-619] creating many short sessions in parallel leads to errors Created: 2016-03-11  Updated: 2021-04-19  Resolved: 2020-07-02

Status: Closed
Project: MariaDB MaxScale
Component/s: mariadbbackend
Affects Version/s: None
Fix Version/s: 2.4.11, 2.4.12

Type: Bug Priority: Major
Reporter: Timofey Turenko Assignee: markus makela
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
Problem/Incident
causes MXS-1351 Partially authenticated connections a... Closed
Relates
relates to MXS-1109 Routing many prepares statements give... Closed
relates to MXS-1455 Aborted connection warnings on mysqld... Closed
Epic Link: Router Improvements

 Description   

Test:
1. create several treads (e.g. 20)
2. every tread opens connections to Maxscale router (tested with RWSplit, but probably same applies for all), does short query ('select 1'), closes session in the loop

Expected result:
sessions are opened, queries are executed, sessions are closed normally

Actual result:

after a while query fails with:

Error: can't execute SQL-query: select 1
Authentication with backend failed. Session will be closed.

session can not b created any more: "failed to create new session"

Maxscale log:

2016-03-11 13:32:56 error : Invalid authentication message from backend. Error code: 1129, Msg : Host 'maxscale' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
2016-03-11 13:32:56 error : Server server1 has been put into maintenance mode due to the server blocking connections from MaxScale. Run 'mysqladmin -h 192.168.121.76 -P 3306 flush-hosts' on this server before taking this server out of maintenance mode.
2016-03-11 13:32:56 error : Could not find master among the backend servers. Previous master's state : NO STATUS
2016-03-11 13:32:56 error : Routing the query failed. Session will be closed.
2016-03-11 13:32:56 error : Could not find master among the backend servers. Previous master's state : NO STATUS
2016-03-11 13:32:56 error : Routing the query failed. Session will be closed.
2016-03-11 13:32:56 error : Could not find master among the backend servers. Previous master's state : NO STATUS
2016-03-11 13:32:56 error : Routing the query failed. Session will be closed.
2016-03-11 13:32:56 error : Could not find master among the backend servers. Previous master's state : NO STATUS
2016-03-11 13:32:56 error : Routing the query failed. Session will be closed.
2016-03-11 13:32:56 error : Invalid authentication message from backend. Error code: 1129, Msg : Host 'maxscale' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'

If during every session some session command is executed (e.g. 'USE test') - no any failure.

Test cases: open_close_connections'
(currently contains 'USE test')



 Comments   
Comment by Johan Wikman [ 2016-05-31 ]

Old issue, still present, tentatively moving to 2.1.

Comment by markus makela [ 2017-08-17 ]

The connections should be kept in the zombie queue for as long as the backend protocol deems necessary. A hard timeout could be used to handle the cases where the backend protocol doesn't finish the authentication fast enough.

Comment by markus makela [ 2019-04-15 ]

In 2.4 the lazy_connect feature will prevent this as unnecessaey connections aren't created.

Comment by Jeff Smelser [ 2019-11-20 ]

I have 2.4.4 with lazy_Connect turned on and this still happens a lot.

2019-11-20 13:43:59 814816 [Warning] Aborted connection 814816 to db: 'unconnected' user: 'unauthenticated' host: '<redacted>' (This connection closed normally without authentication)

Comment by markus makela [ 2019-11-28 ]

I think it can happen in cases where a client forcibly closes a connection mid-query. Can you reliably reproduce this on a test system?

Comment by markus makela [ 2019-12-19 ]

We can avoid this problem by generating a fake handshake response even before we've received the server's proper handshake. This will prevent the connection from counting as a connection error and instead it is counted as an authentication error. In theory this can still occur if the network write doesn't complete before the socket is closed but in practice it should solve the problem.

Comment by markus makela [ 2020-07-01 ]

The fix that was implemented for 2.3 wasn't adequate in the sense that it caused confusing authentication failed error messages in the backend MariaDB servers. As the problems caused by this bug aren't extremely common, the old error message and the previous behavior are easier to deal with in real-life setups.

Comment by markus makela [ 2020-07-02 ]

Added a new entry point to the protocol modules and used that to decide when to move a DCB out of the zombie queue. Currently it's only defined by mariadbbackend and it checks the protocol state to prevent the DCB from being closed before authentication completes.

Comment by markus makela [ 2020-07-14 ]

The fix that was planned for 2.4.11 wasn't adequate and we rolled it back in order to provide a fully working solution in 2.4.12. The old fix was also removed from 2.4.11 meaning that MaxScale will no longer send the forged handshake response when a connection is closed abruptly. This should solve the case where the backend logs are full of false authentication errors messages.

Generated at Thu Feb 08 04:00:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.