[MXS-487] lost connection to backend server - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.2.1
Fix Version/s: 1.3.0
Component/s: readwritesplit
Labels:
None
Environment:
centos 6.7 2.6.32-573.el6.x86_64, mariadb 5.5.41-1.el6 (yum repo) / maxscale 1.2.1

Description

Hello,

I'm using MaxScale 1.21 with MariaDB 5.5 in a readwritesplit environement, and I notice that every 5 or 10 min (it's random), I received errors from our php application:
(2003) Lost connection to backend server (the most frequent one)
(2013) Lost connection to MySQL server during query

it is very similar to what is described by someone else here:
http://stackoverflow.com/questions/33416078/maxscale-lost-connection

the maxadmin show session is showing several 'invalid state'.

Strange thing, is that if I activate the log_trace, the number of error jumps a lot: instead of 3 or 4 each hour, I have one error every minute.

We have maxscale configured in front of 4 differents group of mariadb servers,
maxscale is listening on 4 differents port

[maxscale]
threads=8
auth_connect_timeout=20
auth_read_timeout=20
auth_write_timeout=20
log_trace=0

example for 1 of the 4 clusters
[MySQL Monitor VHD]
type=monitor
module=mysqlmon
servers=dbvhd1,dbvhd2
user=max
passwd=hidden
monitor_interval=10000
disable_master_failback=1
detect_replication_lag=1

[fetch]
type=filter
module=regexfilter
match=fetch
replace=select

[hint]
type=filter
module=hintfilter

[VHDLISTENER]
type=listener
service=RWVHD
protocol=MySQLClient
port=3310
socket=/tmp/ClusterMaster1

[RWVHD]
type=service
router=readwritesplit
servers=dbvhd1, dbvhd2
user=max
passwd=hidden
max_slave_replication_lag=0

(etc...)

not sure if it is implied, but here is some of the sysctl parameter used on the servers:
net.ipv4.tcp_keepalive_time = 7200
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_generic_timeout = 300
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 9

Could it be linked with some sort of timeouts with the keep alive connections ?

As I said, the errors rate is increasing a lot if we activate the log_trace, so we tried to reduce the amount of request handled by the mascale server , we managed to divide it by 2, but the errors are still there.

we have around 280 req/s on each nodes, expect one with 2500req/s.

last point, we are using an old application, mysql/myisam , php and the old mysql extension. not sure if it is linked, maybe mysqli could help ?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

data.sql
0.5 kB
2015-11-26 13:18
old.php
0.6 kB
2015-11-26 13:18

Issue Links

relates to

MDEV-9195 Segmentation fault when using the embedded library

Closed

Activity

People

Assignee:: markus makela

Reporter:: Stephane Q.

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2015-11-24 11:07

Updated:: 2016-02-10 19:01

Resolved:: 2016-02-10 19:00

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.