[MXS-827] ReadWriteSplit only keeps used connection alive, query crashes after unused connection times out Created: 2016-08-10  Updated: 2020-08-25  Resolved: 2017-03-29

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 1.4.3
Fix Version/s: 2.2.0

Type: New Feature Priority: Major
Reporter: Ludwig Gramberg Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

centos 7
wait_timeout=60


Attachments: Text File maxscale-query-server-gone-away.log     File maxscale.cnf     File my.cnf    
Issue Links:
Duplicate
is duplicated by MXS-364 Idle Client receives an error mysql h... Closed
is duplicated by MXS-1397 ReadWriteSplit's master connection ca... Closed

 Description   

Problem

after 60s maxscale sends "MySQL server has gone away" to the client

Setup

  • wait_timeout is 60s
  • master + slave
  • client connects to maxscale with hintfilter and read-write-split
  • worker opens connection to maxscale on startup
  • worker is reading every 1s from maxscale
  • worker never writes (or hints) anything to the master

Files

  • my.cnf of slave
  • maxscale.cnf
  • maxscale.log with "enable log-priority debug"

Speculation

I'm guessing that the connection to the master times out after 60s because it is not being used, this is somehow causing maxscale to close the connection to the client...

the read-write split should be able to also handle that only one connection is being used



 Comments   
Comment by markus makela [ 2016-08-11 ]

Strictly speaking, this is not a bug. The failure of the master connection, in this case caused by a low wait_timeout, is something which the current readwritesplit cannot handle. It is recommended to set wait_timeout to a value which is greater than the interval between writes, this guarantees that the master connection is not closed by the backend database. Most of the time it's not needed when MaxScale is used and the connection_timeout parameter for services should be used instead.

There are ways to treat the symptoms of this problem. One of them is to periodically send a COM_PING from the client through MaxScale. This will be sent to all servers and is a common way to keep connections alive. Connection pooling mechanisms tend to do this to keep the pooled connections alive. Another option is to modify the session state in some way e.g. by doing a SET @a=1. This modification is sent to all open connections for the session so that the session state is consistent. This also has the side-effect of resetting the wait_timeout timer.

It is possible that in the future MaxScale could detect a modification in the wait_timeout and automatically send keepalive queries to connections.

Comment by Ludwig Gramberg [ 2016-08-11 ]

thank you for clarifying.

we have an application which has scenarios where it does use both connections on the read-write-split and read-only scenarios where it doesn't. since maxscale is supposed to be transparent (is it?) it doesn't make sense that it would not be able to handle that.

two suggestions:
1. make this very clear in the known limitations section of the documentation
2. feature-request to change this, for instance maxscale could not open a connection until it is used for the first time or just be able to reopen it if it times out

currently we opted to send a "use <dbname>" every 10s as a keep alive.

Comment by Ludwig Gramberg [ 2016-08-11 ]

another thought i had is this:

what if you have multiple slaves which are being read according to which one is the least behind? if some of them are always behind they will eventually timeout, as soon as they would qualify to be used though the query would crash due to the closed connection.

Comment by markus makela [ 2016-08-11 ]

1. make this very clear in the known limitations section of the documentation

Yes, we will clarify this in the limitations section.

2. feature-request to change this, for instance maxscale could not open a connection until it is used for the first time or just be able to reopen it if it times out

This could be an interesting improvement. Although it would slow down the connection creation due to the possible synchronization of the session state, it could allow for a lower performance impact on the cluster as a whole. By delaying the creation of the connections, we could only use the connections that are needed. Please feel free to create a new feature request detailing this functionality.

Comment by Ludwig Gramberg [ 2016-08-23 ]

seams like something was done in mariadb 2:

Read Continuation upon Master Down

The readwritesplit routing module now supports a high availability read mode
where read queries are allowed even if the master server goes down. The new
functionality supports three modes: disconnection on master failure, disconnection
on first write after master failure and error on write after master failure.

The MySQL monitor module, mysqlmon, now supports stale states for both the master
and slave servers. This means that when a slave loses its master, it will retain
the slave state as long as it is running.

For more details about these new modes, please read the ReadWriteSplit
and MySQL Monitor documentation.

https://mariadb.com/kb/en/mariadb-enterprise/mariadb-maxscale/mariadb-maxscale-200-release-notes/

Comment by Johan Wikman [ 2016-08-30 ]

Changing into an improvement and downgrading to major.

Limitations document updated for 2.0.

Comment by markus makela [ 2017-03-24 ]

Added connection keepalive to readwritesplit.

Generated at Thu Feb 08 04:02:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.