Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
1.2.1
-
None
-
CentOS 6.5 but probably any
Description
Ran two tests in parallel:
1. Connect to DB, make simple query, disconnect
2. Request page from each of two web servers based on CMS that makes a number of queries per page, one connected via read-write, one by read-connection.
During these tests the connection to the master is working or not working on alternate minutes. The connection to the single slave is working or not working at arbitrary times that do not match with the master connection behaviour. This means that more than half of all requests involve a failure forced on MaxScale by the network. The scenario is (hopefully) unrealistic but has been valuable for identifying faults in MaxScale.
The first test has a counter, and after about 150,000 iterations (approx an hour and a half) MaxScale ceased to respond to requests (although MaxAdmin still worked). The second test runs about one cycle per second.
At this point, there were just over 1,000 sessions in the state STOPPING, each with a refcount of 1 and a corresponding number of DCBs.
It was not clear in the time spent examining MaxScale in its unresponsive state to determine why it was not responding. Further tests may be needed.
This fault could be regarded as minor given that it appears to be only triggered in extreme circumstances. On the other hand, it does appear that in an error situation with steady load maybe one in a hundred connections fails to clear. This indicates that there is a fault in the logic somewhere.