Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
2.4.12
-
MXS-SPRINT-139
Description
customer reported that their maxscale node OOMed due to increasing memory usage.
Here is what customer tested and attached config and logs.
To debug the memory usage issue, I've gone through the following steps.
|
|
[root@rnqmax401 ~]# date |
Thu Jan 7 08:41:52 PST 2021 |
[root@rnqmax401 ~]# |
Using top I've captured the PID that is taking up all the memory.
|
top - 07:57:24 up 2 days, 9:33, 1 user, load average: 0.19, 0.14, 0.11 |
Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie |
%Cpu(s): 0.9 us, 0.8 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st |
KiB Mem : 98.2/16247560 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ] |
KiB Swap: 54.2/4194300 [|||||||||||||||||||||||||||||||||||||||||||||||||||||| ] |
|
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
54343 maxscale 20 0 15.7g 14.0g 376 S 0.7 90.7 14:05.80 maxscale |
|
Check which process is running with PID 54343. It's the systemctl MaxScale service. |
[root@rnqmax401 ~]# ps -ef | grep 54343 |
root 43013 68600 0 08:01 pts/2 00:00:00 grep --color=auto 54343 |
maxscale 54343 1 0 Jan05 ? 00:14:07 /usr/bin/maxscale |
[root@rnqmax401 ~]# |
|
Check for the admin port for that MaxScale instance. It is 6111. |
[root@rnqmax401 ~]# grep port /etc/maxscale.cnf |
admin_port = 8991 |
port = 3111 |
port = 3111 |
port = 3111 |
port = 6111 |
port = 3111 |
#port=4442 |
port = 3111 |
port = 3111 |
port = 3111 |
port = 9994 |
# These listeners represent the ports the
|
[root@rnqmax401 ~]# |
|
Before I started this debugging, I've redirected the application connections through a different MaxScale server. As you can see below, there are no active connections while I collected these stats. However, the memory allocated to MaxScale was not released back to the OS. This was captured on Thu Jan 7 08:41:52 PST 2021. |
[root@rnqmax401 ~]# maxadmin -pmariadb -P6111 list servers |
Servers.
|
-------------------+-----------------+-------+-------------+--------------------
|
Server | Address | Port | Connections | Status
|
-------------------+-----------------+-------+-------------+--------------------
|
server1 | 10.142.108.141 | 3111 | 0 | Master, Synced, Running |
server2 | 10.142.108.142 | 3111 | 0 | Slave, Synced, Running |
server3 | 10.142.108.143 | 3111 | 0 | Slave, Synced, Running |
server1AD | 10.142.108.141 | 3111 | 0 | Master, Synced, Running |
server2AD | 10.142.108.142 | 3111 | 0 | Slave, Synced, Running |
server3AD | 10.142.108.143 | 3111 | 0 | Slave, Synced, Running |
-------------------+-----------------+-------+-------------+--------------------
|
[root@rnqmax401 ~]# |
|
MaxScale usage at Tue Jan 5 16:17:49 PST 2021, this was captured before this debug test. |
-------------------+-----------------+-------+-------------+--------------------
|
Server | Address | Port | Connections | Status
|
-------------------+-----------------+-------+-------------+--------------------
|
server1 | 10.142.108.141 | 3111 | 966 | Master, Synced, Running |
server2 | 10.142.108.142 | 3111 | 966 | Slave, Synced, Running |
server3 | 10.142.108.143 | 3111 | 966 | Slave, Synced, Running |
server1AD | 10.142.108.141 | 3111 | 0 | Master, Synced, Running |
server2AD | 10.142.108.142 | 3111 | 0 | Slave, Synced, Running |
server3AD | 10.142.108.143 | 3111 | 0 | Slave, Synced, Running |
-------------------+-----------------+-------+-------------+--------------------
|
|
I've restarted the MaxScale on 2021-01-05 15:02:06 and it was having low usage until Tue Jan 5 16:00:35 PST 2021. Between 16:00 and 16:04, the RAM usage went up from 890 MB to 15327 MB. |
- maxscale log is too large to attach so please check support case.