Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Fixed
-
1.2.1
-
None
-
centos 6.7 2.6.32-573.el6.x86_64, mariadb 5.5.41-1.el6 (yum repo) / maxscale 1.2.1
Description
Hello,
I'm using MaxScale 1.21 with MariaDB 5.5 in a readwritesplit environement, and I notice that every 5 or 10 min (it's random), I received errors from our php application:
(2003) Lost connection to backend server (the most frequent one)
(2013) Lost connection to MySQL server during query
it is very similar to what is described by someone else here:
http://stackoverflow.com/questions/33416078/maxscale-lost-connection
the maxadmin show session is showing several 'invalid state'.
Strange thing, is that if I activate the log_trace, the number of error jumps a lot: instead of 3 or 4 each hour, I have one error every minute.
We have maxscale configured in front of 4 differents group of mariadb servers,
maxscale is listening on 4 differents port
[maxscale]
threads=8
auth_connect_timeout=20
auth_read_timeout=20
auth_write_timeout=20
log_trace=0
example for 1 of the 4 clusters
[MySQL Monitor VHD]
type=monitor
module=mysqlmon
servers=dbvhd1,dbvhd2
user=max
passwd=hidden
monitor_interval=10000
disable_master_failback=1
detect_replication_lag=1
[fetch]
type=filter
module=regexfilter
match=fetch
replace=select
[hint]
type=filter
module=hintfilter
[VHDLISTENER]
type=listener
service=RWVHD
protocol=MySQLClient
port=3310
socket=/tmp/ClusterMaster1
[RWVHD]
type=service
router=readwritesplit
servers=dbvhd1, dbvhd2
user=max
passwd=hidden
max_slave_replication_lag=0
(etc...)
not sure if it is implied, but here is some of the sysctl parameter used on the servers:
net.ipv4.tcp_keepalive_time = 7200
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_generic_timeout = 300
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 9
Could it be linked with some sort of timeouts with the keep alive connections ?
As I said, the errors rate is increasing a lot if we activate the log_trace, so we tried to reduce the amount of request handled by the mascale server , we managed to divide it by 2, but the errors are still there.
we have around 280 req/s on each nodes, expect one with 2500req/s.
last point, we are using an old application, mysql/myisam , php and the old mysql extension. not sure if it is linked, maybe mysqli could help ?
Attachments
Issue Links
- relates to
-
MDEV-9195 Segmentation fault when using the embedded library
- Closed