[MXS-1263] broken TCP connections are not always cleaned properly Created: 2017-05-10  Updated: 2017-05-19  Resolved: 2017-05-19

Status: Closed
Project: MariaDB MaxScale
Component/s: cli
Affects Version/s: 2.0.5, 2.1.2
Fix Version/s: 2.1.3

Type: Bug Priority: Major
Reporter: Andrii Nikitin (Inactive) Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None


 Description   

Following example demonstrates that TCP connections may stuck for long time (I waited for 30+ min); connect_timeout was set to 60 just in case for the 'Read-Write Service'.

This means that socket or open file limit exhaust may happen with unstable network or unstable client programs, etc.

$ bin/maxscale --basedir=$(pwd)
$ sudo bin/maxadmin 'list clients' | wc -l
7
$ for i in {1..100} ; do telnet 127.0.0.1 4006 & done
...
$ sudo bin/maxadmin 'list clients' | wc -l
107
$ date
Wed May 10 22:12:44 CEST 2017
$ sudo bin/maxadmin 'list clients' | wc -l
107
$ date
Wed May 10 22:45:30 CEST 2017
$ sudo bin/maxadmin 'list clients' | wc -l
107
$ netstat | grep 4006 | grep ESTAB | wc -l
200
$ netstat | grep 4006 | grep CLOSE_WAIT | wc -l
100

I tried to configure tcp keepalive just in case - but it doesn't seem to have any effect here.

I also tried the same 'attack' directly towards MariaDB Server :

for i in {1..100}; do telnet 127.0.0.1 3307 & done

And 'show processlist' doesn't show any of those connection hanging around.



 Comments   
Comment by markus makela [ 2017-05-11 ]

The fact that the clients appear in the list clients output is mainly due to the fact that MaxScale doesn't query the protocol modules for the connection state when listing these connections. This means that even connections where authentication hasn't been completed are shown.

Given that technically nothing wrong is done for those connections this is expected behavior. Lowering the value of connection_timeout should prevent these kinds of "attacks" but having a separate pre-authentication timeout would be a good addition and a better solution to this problem.

Comment by markus makela [ 2017-05-11 ]

Some clarification is needed, was connection_timeout set to 60 seconds or 60 minutes? Please attach the relevant parts of the configuration file.

Comment by markus makela [ 2017-05-11 ]

At further inspection of the code, this is a bug as the connection_timeout checks depend on values that only exist for fully established connections.

Generated at Thu Feb 08 04:05:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.