I'm using MaxScale 1.21 with MariaDB 5.5 in a readwritesplit environement, and I notice that every 5 or 10 min (it's random), I received errors from our php application:
(2003) Lost connection to backend server (the most frequent one)
(2013) Lost connection to MySQL server during query
it is very similar to what is described by someone else here:
the maxadmin show session is showing several 'invalid state'.
Strange thing, is that if I activate the log_trace, the number of error jumps a lot: instead of 3 or 4 each hour, I have one error every minute.
We have maxscale configured in front of 4 differents group of mariadb servers,
maxscale is listening on 4 differents port
example for 1 of the 4 clusters
[MySQL Monitor VHD]
not sure if it is implied, but here is some of the sysctl parameter used on the servers:
net.ipv4.tcp_keepalive_time = 7200
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_generic_timeout = 300
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 9
Could it be linked with some sort of timeouts with the keep alive connections ?
As I said, the errors rate is increasing a lot if we activate the log_trace, so we tried to reduce the amount of request handled by the mascale server , we managed to divide it by 2, but the errors are still there.
we have around 280 req/s on each nodes, expect one with 2500req/s.
last point, we are using an old application, mysql/myisam , php and the old mysql extension. not sure if it is linked, maybe mysqli could help ?