[MXS-388] maxscale proxy hang Created: 2015-09-28 Updated: 2015-12-08 Resolved: 2015-12-08 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | maxadmin |
| Affects Version/s: | 1.2.0, 1.2.1 |
| Fix Version/s: | 1.3.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | cai sunny | Assignee: | Johan Wikman |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Linux 2.6.32-504.23.4.el6.x86_64 #1 SMP |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
the server run several days, maxscale proxy may hang. |
| Comments |
| Comment by markus makela [ 2015-09-29 ] |
|
Can you provide the maxscale.cnf you used and the backend server types? |
| Comment by cai sunny [ 2015-10-07 ] |
|
Please check maxscale.cnf. |
| Comment by cai sunny [ 2015-10-07 ] |
|
Backend server are Gelera MySQL server. |
| Comment by cai sunny [ 2015-10-13 ] |
|
I checked, the database still be connected by proxy. |
| Comment by cai sunny [ 2015-10-13 ] |
|
use netstat -at, a lot of connection show "CLOSE_WAIT" |
| Comment by cai sunny [ 2015-10-13 ] |
|
What should I do if maxadmin hang? |
| Comment by markus makela [ 2015-10-13 ] |
|
Is the process using 100% of CPU when it hangs? Are there any error messages in error log? When maxadmin hangs are you able to connect via MySQL client or does that also hang? If there are no messages in the error log it would be a good idea if you can pinpoint which of the services is causing the hang. So if possible, test with each combination of service. This way we will know if some of the services work and if the hanging problem only happens with a certain combination of modules. Usually when maxadmin hangs there is something wrong with spinlocks and how they are released. The only situation I've encountered when maxadmin hangs is when MaxScale is consuming 100% of CPU and there is a deadlock. Also if the maxscale process is hanging, attaching a debugger to it would show where it is hanging. To attach a debugger to MaxScale, install the GDB and issue the following command: |
| Comment by cai sunny [ 2015-10-13 ] |
|
1: top: 2 no other error message in log except: 3 I can connect MySQL db by this maxscale proxy. |
| Comment by cai sunny [ 2015-10-13 ] |
|
It is PROD environment, I cannot install gdb now. |
| Comment by cai sunny [ 2015-10-13 ] |
|
netstat -at|grep 6603 |
| Comment by cai sunny [ 2015-10-13 ] |
|
netstat -at|wc |
| Comment by cai sunny [ 2015-10-13 ] |
|
ulimit -a |
| Comment by markus makela [ 2015-10-16 ] |
|
If you can connect to the database through MaxScale, this seems to be a bug in the MaxAdmin client or the module that it connects to. |
| Comment by markus makela [ 2015-10-16 ] |
|
I've found a bug ( |
| Comment by cai sunny [ 2015-10-17 ] |
|
It is hang when run "show services". |
| Comment by Johan Wikman [ 2015-10-20 ] |
|
caisunny Please try with version 1.2.1. |
| Comment by Alex Vladulescu [ 2015-10-22 ] |
|
Hello, I got extremely lucky to be in the same situation as cai sunny (but on 1.2.1). I have a Debian 7.9 setup (described in the Logged in into VM, checked CPU all 8 cores at 100% usage, under htop only maxscale is using the CPU (~800%) and no other services working on the server besides keepalived. The connection to maxadmin console is working (for a while), so I could get a few commands typed into the console before I needed to urgently restart the process (after everything gets back to normal). I need to add as well that if I leave the server in this state (tested) and not react to it in the idea of load getting back to normal, queries to DB via maxscale are visible getting heavier to complete and eventually go for a full stop. In hope I could prove myself more useful to you guys I have put this log over web at: http://www.bfproject.ro/maxscale-hanged-lsof.txt ..I could managed to collect from lsof before doing restart, while service become 100% unresponsive. The output for the commands I managed to take from the console are: MaxScale> list servers And from linux shell after process restart: Is there we could identify what's causing these errors ? Thanks |
| Comment by Johan Wikman [ 2015-11-22 ] |
|
A couple of locking issues have been discoverd.
Both of these have beem fixed in develop, although it is not certain they were the cause the lockups described here. |
| Comment by markus makela [ 2015-12-08 ] |
|
We're closing this since we haven't been able to reproduce this and because of no new information from the reporter. If this problem still persists in 1.3.0, this bug can be reopened. |