Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-487

lost connection to backend server

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Fixed
    • 1.2.1
    • 1.3.0
    • readwritesplit
    • None
    • centos 6.7 2.6.32-573.el6.x86_64, mariadb 5.5.41-1.el6 (yum repo) / maxscale 1.2.1

    Description

      Hello,

      I'm using MaxScale 1.21 with MariaDB 5.5 in a readwritesplit environement, and I notice that every 5 or 10 min (it's random), I received errors from our php application:
      (2003) Lost connection to backend server (the most frequent one)
      (2013) Lost connection to MySQL server during query

      it is very similar to what is described by someone else here:
      http://stackoverflow.com/questions/33416078/maxscale-lost-connection

      the maxadmin show session is showing several 'invalid state'.

      Strange thing, is that if I activate the log_trace, the number of error jumps a lot: instead of 3 or 4 each hour, I have one error every minute.

      We have maxscale configured in front of 4 differents group of mariadb servers,
      maxscale is listening on 4 differents port

      [maxscale]
      threads=8
      auth_connect_timeout=20
      auth_read_timeout=20
      auth_write_timeout=20
      log_trace=0

      example for 1 of the 4 clusters
      [MySQL Monitor VHD]
      type=monitor
      module=mysqlmon
      servers=dbvhd1,dbvhd2
      user=max
      passwd=hidden
      monitor_interval=10000
      disable_master_failback=1
      detect_replication_lag=1

      [fetch]
      type=filter
      module=regexfilter
      match=fetch
      replace=select

      [hint]
      type=filter
      module=hintfilter

      [VHDLISTENER]
      type=listener
      service=RWVHD
      protocol=MySQLClient
      port=3310
      socket=/tmp/ClusterMaster1

      [RWVHD]
      type=service
      router=readwritesplit
      servers=dbvhd1, dbvhd2
      user=max
      passwd=hidden
      max_slave_replication_lag=0

      (etc...)

      not sure if it is implied, but here is some of the sysctl parameter used on the servers:
      net.ipv4.tcp_keepalive_time = 7200
      net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10
      net.netfilter.nf_conntrack_tcp_timeout_established = 432000
      net.netfilter.nf_conntrack_generic_timeout = 300
      net.ipv4.tcp_max_tw_buckets = 2000000
      net.ipv4.tcp_fin_timeout = 10
      net.ipv4.tcp_tw_reuse = 1
      net.ipv4.tcp_keepalive_intvl = 15
      net.ipv4.tcp_keepalive_probes = 9

      Could it be linked with some sort of timeouts with the keep alive connections ?

      As I said, the errors rate is increasing a lot if we activate the log_trace, so we tried to reduce the amount of request handled by the mascale server , we managed to divide it by 2, but the errors are still there.

      we have around 280 req/s on each nodes, expect one with 2500req/s.

      last point, we are using an old application, mysql/myisam , php and the old mysql extension. not sure if it is linked, maybe mysqli could help ?

      Attachments

        1. data.sql
          0.5 kB
          markus makela
        2. old.php
          0.6 kB
          markus makela

        Issue Links

          Activity

            People

              markus makela markus makela
              Stephane Stephane Q.
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.