[MXS-3653] Lost connection to master server while waiting for a result. Connection has been idle for 0.0 seconds Created: 2021-07-01  Updated: 2021-07-23  Resolved: 2021-07-23

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 2.5.13
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Maiavria Assignee: markus makela
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Debian 10.10, 10.5.11-MariaDB


Sprint: MXS-SPRINT-136

 Description   

Hello,

I'm experiencing theses warnings in MX's log every seconds.

I tried disabling persistent pool without success. I've also added backend timeouts without success. No success with setting a higher wait_timeout on mariadb either.

My configuration is a read-write split router and 3x 10.5.11-MariaDB galera cluster nodes.

With info_log enabled I get the following :

2021-07-01 12:45:27   info   : Stopped RW2 client session [227]
2021-07-01 12:45:27   info   : (228) (RW2) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 13, type: QUERY_TYPE_READ, stmt: SELECT 1
2021-07-01 12:45:27   info   : (228) [readwritesplit] (RW2) Route query to slave: marianode3 <
2021-07-01 12:45:27   info   : (228) [readwritesplit] (RW2) Reply complete, last reply from marianode3
2021-07-01 12:45:27   info   : (228) [qc_sqlite] (RW2) Parsing the query failed, cannot report query type: KILL CONNECTION_ID();
2021-07-01 12:45:27   info   : (228) [qc_sqlite] (RW2) Parsing the query failed, cannot report query operation: KILL CONNECTION_ID();
2021-07-01 12:45:27   info   : (228) [qc_sqlite] (RW2) Parsing the query failed, cannot report query operation: KILL CONNECTION_ID();
2021-07-01 12:45:27   info   : (228) (RW2) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 26, type: , stmt: KILL CONNECTION_ID();
2021-07-01 12:45:27   info   : (228) [readwritesplit] (RW2) Route query to master: marianode1 <
2021-07-01 12:45:27   info   : (228) [readwritesplit] (RW2) Master 'marianode1' failed: #HY000: Lost connection to backend server: connection closed by peer (marianode1)
2021-07-01 12:45:27   error  : (228) [readwritesplit] (RW2) Lost connection to the master server, closing session. Lost connection to master server while waiting for a result. Connection has been idle for 0 seconds. Error caused by: #HY000: Lost connection to backend server: connection closed by peer (marianode1). Last close reason: <none>. Last error:

My MX configuration :

[maxscale]
threads=auto
#log_notice=1
#log_debug=1
#log_info=1
syslog=false
auth_connect_timeout=3s
auth_read_timeout=3s
auth_write_timeout=3s
 
[MySQLMonitorLocal]
type=monitor
module=galeramon
servers=marianode1,marianode2,marianode3
user=maxscale
password=hidden
monitor_interval=1000
backend_connect_timeout=3s
backend_write_timeout=3s
backend_read_timeout=3s
 
 
# =============== Services ================ #
 
[RW2]
type=service
router=readwritesplit
# deprecated
#localhost_match_wildcard_host=1
servers=marianode1,marianode2,marianode3
max_slave_connections=2
user=maxscale
password=hidden
max_connections=350
strip_db_esc=true
 
# ===============  Listeners ================ #
 
[RWListenerLocal]
type=listener
service=RW2
protocol=MariaDBClient
port=3310
address=127.0.0.1
 
[RWListenerLocalSocket]
type=listener
service=RW2
protocol=MariaDBClient
socket=/var/lib/maxscale/rwsplit.sock
 
# =============== Servers ================ #
[marianode1]
type=server
address=10.100.20.50
port=3306
protocol=MariaDBBackend
#persistpoolmax=100
#persistmaxtime=60s
#rank=primary
 
[marianode2]
type=server
address=10.100.20.51
port=3306
protocol=MariaDBBackend
#persistpoolmax=100
#persistmaxtime=60s
#rank=secondary
 
[marianode3]
type=server
address=10.100.20.52
port=3306
protocol=MariaDBBackend
#persistpoolmax=100
#persistmaxtime=60s
#rank=primary

Do you have any clues why this happening with latest version ?



 Comments   
Comment by markus makela [ 2021-07-11 ]

The error is logged because the client closes the connection with KILL CONNECTION_ID(). This causes an unexpected interruption of the connection which MaxScale proceeds to log. Currently this seems to be expected behavior but we could suppress the error for that particular error type.

Comment by markus makela [ 2021-07-14 ]

It turns out this isn't something that's easy to fix: if you kill your own connection, I believe you aren't guaranteed to receive the error for it before the network hangup arrives. For this reason it's possible that MaxScale thinks it lost the connection to the master and decides to log the message.

In order for readwritesplit to know that it shouldn't log this, it would need to detect that the KILL targets the connection itself. Since this isn't a very practical way of closing a connection, I don't think it's worth it to fix this just for this one query that, at least to my knowledge, isn't useful.

Comment by markus makela [ 2021-07-18 ]

jom why is the KILL CONNECTION_ID() executed and who does it? Can you remove it or further explain why the connection needs to KILL the same connection that does it?

Comment by markus makela [ 2021-07-23 ]

Since this seems to be something that's caused by the application itself, I'll close this as Won't Fix until we get more feedback.

Generated at Thu Feb 08 04:22:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.