[MXS-416] Orphan sessions appear after many network errors Created: 2015-10-19 Updated: 2015-12-01 Resolved: 2015-12-01 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core |
| Affects Version/s: | 1.2.1 |
| Fix Version/s: | 1.3.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | martin brampton (Inactive) | Assignee: | martin brampton (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 6.5 but probably any |
||
| Description |
|
Ran two tests in parallel: During these tests the connection to the master is working or not working on alternate minutes. The connection to the single slave is working or not working at arbitrary times that do not match with the master connection behaviour. This means that more than half of all requests involve a failure forced on MaxScale by the network. The scenario is (hopefully) unrealistic but has been valuable for identifying faults in MaxScale. The first test has a counter, and after about 150,000 iterations (approx an hour and a half) MaxScale ceased to respond to requests (although MaxAdmin still worked). The second test runs about one cycle per second. At this point, there were just over 1,000 sessions in the state STOPPING, each with a refcount of 1 and a corresponding number of DCBs. It was not clear in the time spent examining MaxScale in its unresponsive state to determine why it was not responding. Further tests may be needed. This fault could be regarded as minor given that it appears to be only triggered in extreme circumstances. On the other hand, it does appear that in an error situation with steady load maybe one in a hundred connections fails to clear. This indicates that there is a fault in the logic somewhere. |
| Comments |
| Comment by Dipti Joshi (Inactive) [ 2015-10-30 ] |
|
johan.wikman Is any one working on this ? |
| Comment by martin brampton (Inactive) [ 2015-11-03 ] |
|
The main problem is fixed - it was introduced by The ability to complete long tests on this issue shows that with the corrections, there is now a memory leak, losing about 100 MB in a million iterations. Checks will be carried out to determine the cause. |
| Comment by martin brampton (Inactive) [ 2015-11-03 ] |
|
Correction - the residual sessions are not orphans, just waiting for something to happen. So the remaining problem in this area is the memory leak. |
| Comment by martin brampton (Inactive) [ 2015-12-01 ] |
|
Fixed. |