[MXS-1455] Aborted connection warnings on mysqld logs Created: 2017-09-26  Updated: 2018-04-17  Resolved: 2018-04-17

Status: Closed
Project: MariaDB MaxScale
Component/s: N/A
Affects Version/s: 2.0.3, 2.1.9
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Wagner Bianchi (Inactive) Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Relates
relates to MXS-619 creating many short sessions in paral... Closed

 Description   

Folks,

After some investigation we detected that the user for the MySQL Monitor on Maxscale is aborting lots of connections and making mysqld to log the below kinds of Warnings on the error log:

170710 10:28:24 [Warning] Aborted connection 4147497 to db: 'unconnected' user: 'unauthenticated' host: '10.0.127.40' (Got an error reading communication packets)
170710 10:28:34 [Warning] Aborted connection 4147522 to db: 'unconnected' user: 'unauthenticated' host: '10.0.127.40' (Got an error reading communication packets)

I cannot prove a theory I have around it, but, I think that if a thread delays too much to get authenticated, when for example a thread stays in `unauthenticated user` for more time then expected, it’s possibly considered by maxscale as inactivity and then, the connection/threads is not completed 100% as the thread is cut in the middle by the timeout signal - a not properly closed connection is one of the reasons to have that error.

As suggested, I got the following in place:

[root@maxscale ~]# maxadmin show monitor monitor
Monitor: 0x1bdb830
Name: monitor
Monitor running
Sampling interval: 100 milliseconds
MaxScale MonitorId: 0
Replication lag: enabled
Detect Stale Master: enabled
Connect Timeout: 1 seconds
Read Timeout: 1 seconds
Write Timeout: 1 seconds
Monitored servers: 192.168.50.11:3306, 192.168.50.12:3306, 192.168.50.13:3306

But I was not able to simulate that, is there something we can check further more? Why is it appearing only with the maxmon user?

Thanks!



 Comments   
Comment by markus makela [ 2017-09-27 ]

The first step would be to upgrade to 2.0.6 and see if the problems still occur.

I've discussed this with the MariaDB server team and they came to the conclusion that the most likely cause for those error messages is a client side timeout being exceeded. If this theory is true, then increasing it should remove any errors.

Decreasing the timeouts to one second was my idea of forcing the problems to appear but if the connection creation is fast enough, it would not guarantee that the problems disappear.

Comment by Todd Stoffel (Inactive) [ 2017-09-27 ]

@markus, I can confirm that the issue appears on all version of MaxScale prior to today's release of 2.1.9 (which I have not yet tested).

MaxScale> show monitors
Monitor: 0x25aac90
Name: MySQL Monitor
Monitor running
Sampling interval: 1000 milliseconds
MaxScale MonitorId: 0
Replication lag: disabled
Detect Stale Master: enabled
Connect Timeout: 9 seconds
Read Timeout: 1 seconds
Write Timeout: 2 seconds
Monitored servers: xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx

Comment by Maikel Punie [ 2017-09-28 ]

we started to see the same problem when we upgraded from mariadb 10.1.26 to 10.2.9

on the servers running 10.2.9 we see a lot of these error messages in the logs. This was never seen in mariadb 10.1.x

Comment by Wagner Bianchi (Inactive) [ 2017-10-04 ]

Hello Folks,

This morning, I attempted to disable replication heartbeats on a customer running Maxscale 2.0 which is having the same warning messages being added to the error log:

[rdba@box ~]$ date; maxadmin -uheisenbeg disable heartbeat "Replication Monitor"
Wed  4 Oct 14:25:28 UTC 2017
Password:
 
171004 14:25:40 [Warning] Aborted connection 1925382 to db: 'unconnected' user: 'unauthenticated' host: 'msx.bb.com' (Got an error reading communication packets)
171004 14:27:29 [Warning] Aborted connection 1925565 to db: 'unconnected' user: 'unauthenticated' host: 'msx.bb.com' (Got an error reading communication packets)
 
[rdba@box ~]$ date; maxadmin -uheisenbeg enable heartbeat "Replication Monitor"
Wed  4 Oct 14:27:43 UTC 2017
Password:

Investigation to be continued...

Comment by Wagner Bianchi (Inactive) [ 2017-10-11 ]

Hello folks,

So, continuing with this investigation, now I moved a customer Maxscale from 2.0.3 to the latest relase on the 2 version, which is 2.0.6:

[root@maxscale01 ~]# maxscale --version
MaxScale 2.0.6
 
[root@maxscale02 ~]# maxscale --version
MaxScale 2.0.6

Additionally, I added to the config files:

[Replication Monitor]
monitor_interval=10000
backend_connect_timeout=9
backend_write_timeout=6
backend_read_timeout=3

Before it was all on defaults, it seems that warning messages are not that frequent anymore, but, they are there yet.

Comment by markus makela [ 2017-10-12 ]

This could be caused by MXS-619.

Comment by markus makela [ 2018-04-17 ]

Closing as duplicate of MXS-619.

Generated at Thu Feb 08 04:06:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.