[MXS-361] crash on backend restart if persistent connections are in use Created: 2015-09-10  Updated: 2015-11-27  Resolved: 2015-11-27

Status: Closed
Project: MariaDB MaxScale
Component/s: galeramon
Affects Version/s: None
Fix Version/s: 1.3.0

Type: Bug Priority: Major
Reporter: Timofey Turenko Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MXS-472 Monitors update status in multiple steps Closed

 Description   

1. Create 75 connections to each service: RW split, ReadConn master, ReadConn slave with Master/Slave backend, RW split with Galera backend
2. Close all connections
3. Stop and restart backends

4. Check logs and cores

Expected results: no crash

Actual result: crash during Maxscale restart

(gdb) bt
#0 0x00007f8a59354ced in router_handle_state_switch (dcb=0x1f504a0, reason=DCB_REASON_NOT_RESPONDING, data=0x1f4af50)
at /home/ec2-user/workspace/server/modules/routing/readwritesplit/readwritesplit.c:5310
#1 0x0000000000557cd0 in dcb_call_callback (dcb=0x1f504a0, reason=DCB_REASON_NOT_RESPONDING) at /home/ec2-user/workspace/server/core/dcb.c:2629
#2 0x0000000000557f19 in dcb_call_foreach (server=0x1f3bba0, reason=DCB_REASON_NOT_RESPONDING) at /home/ec2-user/workspace/server/core/dcb.c:2743
#3 0x00007f8a53de9b25 in monitorMain (arg=0x1f4fcf0) at /home/ec2-user/workspace/server/modules/monitor/mysql_mon.c:864
#4 0x00007f8a733a8df3 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f8a71c7901d in clone () from /lib64/libc.so.6
(gdb)



 Comments   
Comment by Dipti Joshi (Inactive) [ 2015-09-10 ]

tturenko Please attach config file and log files. Also state which branch build did you use ?

Comment by Timofey Turenko [ 2015-09-10 ]

no need to restart Maxscale, it is enough to stop and restart backend nodes.

Configuration of backend:

[server1]
type=server
address=###repl_server_IP_1###
port=###repl_server_port_1###
protocol=MySQLBackend
persistpoolmax=1
persistmaxtime=3660

[server2]
type=server
address=###repl_server_IP_2###
port=###repl_server_port_2###
protocol=MySQLBackend
persistpoolmax=5
persistmaxtime=60

[server3]
type=server
address=###repl_server_IP_3###
port=###repl_server_port_3###
protocol=MySQLBackend
persistpoolmax=10
persistmaxtime=60

[server4]
type=server
address=###repl_server_IP_4###
port=###repl_server_port_4###
protocol=MySQLBackend
persistpoolmax=30
persistmaxtime=30

Comment by Timofey Turenko [ 2015-09-10 ]

The number "75 connections": it is a bit bigger, then Maxscale can process:

last two connections generate error:

2015-09-10 17:08:27 Error : Invalid authentication message from backend. Error code: 1040, Msg : Too many connections [gw_read_backend_handshake]
2015-09-10 17:08:27 Error : Unable to write to backend due to authentication failure. [gw_MySQLWrite_backend]
2015-09-10 17:08:27 Error : Invalid authentication message from backend. Error code: 1040, Msg : Too many connections [gw_read_backend_handshake]
2015-09-10 17:08:27 Error : Unable to write to backend due to authentication failure. [gw_MySQLWrite_backend]

and queries won't go through these connections:

Error: can't execute SQL-query: select 1;
MySQL server has gone away

Query failed!
Error: can't execute SQL-query: select 1;
MySQL server has gone away

Comment by Dipti Joshi (Inactive) [ 2015-09-11 ]

tturenko, Massimiliano Pinto Is this in progress ?

Comment by Johan Wikman [ 2015-09-14 ]

Was this on MXS-329 or develop?

Comment by Timofey Turenko [ 2015-09-14 ]

reproducible only if persistent connections feature is enabled

Comment by Timofey Turenko [ 2015-09-14 ]

tested with 'develop' branch

Comment by martin brampton (Inactive) [ 2015-09-16 ]

Persistent connections development was carried further in MXS-329 branch. Little to be gained by analysing problems on the development branch unless the same test shows a fault against MXS-329.

Comment by Dipti Joshi (Inactive) [ 2015-11-24 ]

tturenko, martin brampton Since MXS-329 is merged with develop, have we retested this issue ? If we have has it been fixed ?

Comment by Johan Wikman [ 2015-11-24 ]

According to Timofey this is dependent upon the galera monitor fix.

Comment by Timofey Turenko [ 2015-11-27 ]

test PASSED, closing

Generated at Thu Feb 08 03:58:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.