[MXS-812] Number of conns not matching number of operations Created: 2016-07-27  Updated: 2016-09-15  Resolved: 2016-09-09

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: None
Fix Version/s: 2.0.1

Type: Bug Priority: Major
Reporter: Yorick Terweijden Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None


 Description   

Output from $ maxadmin show servers

type=service
router=readwritesplit
router_options=slave_selection_criteria=LEAST_CURRENT_OPERATIONS

Server 0x1aab680 (node1)
	Number of connections:               14173580
	Current no. of conns:                314
	Current no. of operations:   1661
Server 0x1aab4e0 (node2)
	Number of connections:               24773379
	Current no. of conns:                321
	Current no. of operations:   1719
Server 0x1aab340 (node3)
	Number of connections:               24478708
	Current no. of conns:                321
	Current no. of operations:   1137
Server 0x1aab1a0 (node4)
	Number of connections:               24474162
	Current no. of conns:                321
	Current no. of operations:   3
Server 0x1aa8b80 (node5)
	Number of connections:               24475171
	Current no. of conns:                321
	Current no. of operations:   11



 Comments   
Comment by Guillaume Lefranc [ 2016-09-08 ]

I also encounter this bug in production. It seems the counters stop updating when the connection is closed due to an inconsistent state.
For example:
2016-09-08 03:05:34 error : Slave 'db18' (10.10.44.38:3306) failed to execute session command.
2016-09-08 03:05:34 error : Failed to execute session command in 10.10.44.38:3306. Error was: 08S01 WSREP has not yet prepared node for application use
2016-09-08 03:05:34 error : Slave server 'db18': response differs from master's response. Closing connection due to inconsistent session state.
2016-09-08 03:05:35 notice : Server changed state: db18[10.10.44.38:3306]: lost_slave
2016-09-08 03:05:45 notice : Server changed state: db18[10.10.44.38:3306]: new_slave

Comment by Guillaume Lefranc [ 2016-09-08 ]

To put this in context, the backend (db18) is part of a Galera server. It left the cluster because of a network split with the other nodes, then rejoined.
It should be easy to reproduce by adding an iptables rule that rejects the traffic from other nodes, but still accepts connections from MaxScale.

Comment by markus makela [ 2016-09-09 ]

The active operation counters were not updated properly when multiple client packets were received. The counters were also incremented for each received packet instead of each started operation.

A good way to test this is to stream BLOB data via the C API and block the master connection mid-stream. This will cause an immediate mismatch in active connection and operation counts.

Generated at Thu Feb 08 04:02:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.