[MXS-1977] Maxscale 2.2.6 memory leak Created: 2018-07-13  Updated: 2018-12-18  Resolved: 2018-07-17

Status: Closed
Project: MariaDB MaxScale
Component/s: mariadbbackend, readwritesplit
Affects Version/s: 2.2.6
Fix Version/s: 2.2.12

Type: Bug Priority: Critical
Reporter: Daniel Snow Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

Debian GNU/Linux 8 \n \l
Linux kernel 3.16.0-4-amd64


Attachments: PNG File Screen Shot 2018-07-14 at 9.55.36 PM.png     PNG File Screen Shot 2018-07-15 at 1.03.51 PM.png    
Issue Links:
Relates
relates to MXS-2224 MaxScale 2.3.2 Memory leaks causes OOM Closed

 Description   

Good day! Our maxscale after upgrade from 2.1 to 2.2.6 started suffering from an extremely high memory consumption, that after some time makes system unusable.

Using limits.conf (systemd resource control https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html) to limit memory usage of process leads process to backtracing and re-starting.

Here is our MaxScale config:

[maxscale]
threads=4
 
[server0]
type=server
address=192.168.34.34
port=3306
protocol=MySQLBackend
serversize=0
priority=6
 
[server1]
type=server
address=192.168.34.35
port=3306
protocol=MySQLBackend
serversize=100
priority=1
 
[server2]
type=server
address=192.168.34.36
port=3306
protocol=MySQLBackend
serversize=100
priority=2
 
[server3]
type=server
address=192.168.34.37
port=3306
protocol=MySQLBackend
serversize=100
priority=5
 
[server4]
type=server
address=192.168.34.38
port=3306
protocol=MySQLBackend
serversize=100
priority=3
 
[MySQL Monitor]
type=monitor
module=galeramon
#module=mmmon
servers=server0, server1, server2, server3, server4
user=maxscale
passwd=maxscale
monitor_interval=5000
use_priority=true
 
[Read-Write-Master Service]
type=service
router=readconnroute
servers=server0, server1, server2, server3, server4
user=maxscale
passwd=maxscale
router_options=master
 
[Read-Write Service]
type=service
router=readwritesplit
servers=server0, server1, server2, server3, server4
user=maxscale
passwd=maxscale
max_slave_connections=1
router_options=slave_selection_criteria=LEAST_GLOBAL_CONNECTIONS,master_accept_reads=true,strict_multi_stmt=true
 
[Read-Only Service]
type=service
router=readconnroute
servers=server0, server1, server2, server3, server4
user=maxscale
passwd=maxscale
router_options=slave
weightby=serversize
 
[MaxAdmin Service]
type=service
router=cli
 
[Read-Write-Master Listener]
type=listener
service=Read-Write-Master Service
protocol=MySQLClient
port=4007
 
[Read-Write Listener]
type=listener
service=Read-Write Service
protocol=MySQLClient
port=4006
 
[Read-Only Listener]
type=listener
service=Read-Only Service
protocol=MySQLClient
port=4005
 
[MaxAdmin Listener]
type=listener
service=MaxAdmin Service
protocol=maxscaled
socket=default

Here is backtrace of out of memory maxscale:

2018-07-13 20:19:49 alert : (1049629) Fatal: MaxScale 2.2.6 received fatal signal 6. Attempting backtrace.
2018-07-13 20:19:49 alert : (1049629) Commit ID: b72a7a89262eeacac3f3fda3041b6607a57655b7 System name: Linux Release string: undefined
2018-07-13 20:19:49 alert : (1049629) /usr/bin/maxscale() [0x407a60]: ??:0
2018-07-13 20:19:49 alert : (1049629) /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890) [0x7f5ed56bd890]: ??:?
2018-07-13 20:19:49 alert : (1049629) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f5ed35f2067]: ??:0
2018-07-13 20:19:49 alert : (1049629) /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f5ed35f3448]: ??:0
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(ZN9gnu_cxx27_verbose_terminate_handlerEv+0x15d) [0x7f5ed46e3b3d]: ??:0
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ebb6) [0x7f5ed46e1bb6]: ??:?
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ec01) [0x7f5ed46e1c01]: ??:?
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__cxa_rethrow+0x49) [0x7f5ed46e1e69]: ??:0
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(ZNSt3tr110_HashtableIjSt4pairIKjjESaIS3_ESt10_Select1stIS3_ESt8equal_toIjENS_4hashIjEENS_8_detail18_Mod_range_hashingENSB_20_Default_ranged_hashENSB_20_Prime_rehash_policyELb0ELb0ELb1EE16_M_insert_bucketERKS3_mm+0x342) [0x7f5ed100fc02]: /usr/include/c++/4.9/tr1/hashtable.h:874
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_ZN9PSManager5storeEP5gwbufj+0x129) [0x7f5ed100f679]: /usr/include/c++/4.9/tr1/hashtable_policy.h:531
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_Z19route_session_writeP14RWSplitSessionP5gwbufhj+0x322) [0x7f5ed100b2c2]: /home/vagrant/MaxScale/server/modules/routing/readwritesplit/rwsplit_route_stmt.cc:345
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_Z20handle_target_is_all14route_target_tP7RWSplitP14RWSplitSessionP5gwbufij+0x105) [0x7f5ed1006bf5]: /home/vagrant/MaxScale/server/modules/routing/readwritesplit/rwsplit_mysql.cc:252
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(_Z17route_single_stmtP7RWSplitP14RWSplitSessionP5gwbufRK9RouteInfo+0x111) [0x7f5ed100a3e1]: /home/vagrant/MaxScale/server/modules/routing/readwritesplit/rwsplit_route_stmt.cc:283
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(+0x815c) [0x7f5ed100215c]: /home/vagrant/MaxScale/server/modules/routing/readwritesplit/readwritesplit.cc:946
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libmariadbclient.so(+0x41b3) [0x7f5ecffcc1b3]: /home/vagrant/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:1711
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x5ae02) [0x7f5ed5b86e02]: /home/vagrant/MaxScale/server/core/dcb.cc:3108
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x5afd8) [0x7f5ed5b86fd8]: /home/vagrant/MaxScale/server/core/dcb.cc:3200
2018-07-13 20:19:49 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale6Worker15poll_waiteventsEv+0x1c1) [0x7f5ed5bc0611]: /home/vagrant/MaxScale/server/core/worker.cc:1207
2018-07-13 20:19:50 alert : (1049629) /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale6Worker3runEv+0x10f) [0x7f5ed5bc091f]: /home/vagrant/MaxScale/include/maxscale/log_manager.h:116
2018-07-13 20:19:50 alert : (1049629) /usr/bin/maxscale(main+0x1d92) [0x406112]: /home/vagrant/MaxScale/server/core/gateway.cc:2254
2018-07-13 20:19:50 alert : (1049629) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f5ed35deb45]: ??:0
2018-07-13 20:19:50 alert : (1049629) /usr/bin/maxscale() [0x406a85]: ??:0

I read the next minor releases notes of 2.2.x and haven't found any fixes of such leaks described.

Hope someone can assist or help to dig into the root cause of such memory leaking.

Here is our memory usage statistics from one of not very busy production servers. 11GB of memory for proxy is very strange operating.

root@wifidb3:~# cat /proc/7481/status
Name: maxscale
State: S (sleeping)
...
VmPeak: 25622988 kB
VmSize: 25561088 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 11341768 kB
VmRSS: 11341768 kB
VmData: 25465648 kB
VmStk: 2180 kB
VmExe: 52 kB
VmLib: 14736 kB
VmPTE: 49332 kB
VmPMD: 108 kB
VmSwap: 13789792 kB
...



 Comments   
Comment by markus makela [ 2018-07-13 ]

I'd recommend upgrading to MaxScale 2.2.11 even if no explicit fix to your problem is found. By upgrading to the latest GA version, we'll have a know good starting point and we verify that no known bugs interfere with the problem resolution process. Once we know whether 2.2.11 suffers from the same problem, we can proceed by attempting to create a minimal, verifiable test case.

I'm suspecting that the main culprit in this case is the readwritesplit router or at least that's where the stacktrace points to. By the looks of it, the code in question is the new-in-2.2 prepared statement code for binary protocol prepared statements. Checking the following points would be of immense help:

  1. Does your application uses prepared statements
  2. What type of connector is in use
  3. Is connection pooling is in use
  4. How the memory usage increases over time (gradual growth or sudden spikes)
Comment by Daniel Snow [ 2018-07-13 ]

1. Yes. http://php.net/manual/en/pdo.prepare.php
2. php pdo
3. No. Each user query creates a new connection.
4. It's gradual growth.

> I'm suspecting that the main culprit in this case is the readwritesplit router or at least that's where the stacktrace points to. By the looks of it, the code in question is the new-in-2.2 prepared statement code for binary protocol prepared statements.

I think so too. I've noticed a difference in default behavior of STMT handling comparing to Maxscale 2.1 - and returned in readwritesplit parameter strict_multi_stmt to "true", but this don't helped.

I'll update to 2.2.11 and report if something will change in memory consumption.

Comment by Daniel Snow [ 2018-07-14 ]

I've tried 2.2.11 - unfortunately problem still exists. Take a look to gradual growth of RSS consumption over the short period of time. 1Gb for a day - it's too much and fast growth.
Please ignore a little measurements drop at 14:00 from the attach: it just a monitoring script lag.

Comment by Daniel Snow [ 2018-07-15 ]

I rolled back to 2.1.17 and no memory leaking in this version. Memory consumption stay in 16MB range.

Comment by markus makela [ 2018-07-15 ]

A small memory leak was found in the code that processes session commands. A fix to this leak is available on the 2.2-markusjm branch on GitHub.

Comment by Daniel Snow [ 2018-07-15 ]

Thank you, I'll check.

May you also make sure that strict_multi_stmt=true is not broken in 2.2.x?

In logs, when STMT with strict_multi_stmt=true goes through MaxScale 2.1.x, logger says:
_2018-07-15 19:16:33 info : (181) [readwritesplit] > Autocommit: [enabled], trx is [not open], cmd: (0x19) MYSQL_COM_STMT_CLOSE, type: QUERY_TYPE_WRITE, stmt:
2018-07-15 19:16:33 info : (181) [readwritesplit] Route query to master [192.168.34.35]:3306 <
2018-07-15 19:16:33 info : (181) [readwritesplit] > Autocommit: [enabled], trx is [not open], cmd: (0x16) MYSQL_COM_STMT_PREPARE, type: QUERY_TYPE_READ|QUERY_TYPE_PREPARE_STMT, stmt:_

And in MaxScale 2.2.x logger says:
_2018-06-20 21:41:47 info : (3309) [readwritesplit] Session write, routing to all servers.
2018-06-20 21:41:47 info : (3309) [readwritesplit] Route query to slave [192.168.34.34]:3306
2018-06-20 21:41:47 info : (3309) [readwritesplit] Route query to master [192.168.34.35]:3306
2018-06-20 21:41:47 info : (3309) [readwritesplit] Route query to slave [192.168.34.36]:3306
2018-06-20 21:41:47 info : (3309) [readwritesplit] Route query to slave [192.168.34.37]:3306
2018-06-20 21:41:47 info : (3309) [readwritesplit] Route query to slave [192.168.34.38]:3306_

This is seems like a bug too.

Comment by markus makela [ 2018-07-16 ]

strict_multi_stmt controls what happens after a multi-statement query is executed.

For example, the following query has two SQL statements in it:

$handle->exec("UPDATE t1 SET a = 1; SELECT * FROM t1 WHERE a = 1;");

The whole UPDATE t1 SET a = 1; SELECT * FROM t1 WHERE a = 1; query will be sent at one time without waiting for the first one to return a result.

Due to a few technical limitations and practical complexity, readwritesplit will route all queries that contain more than one statement to the master. With strict_multi_stmt=false (default), normal operation continues after the statement has been executed on the master. With strict_multi_stmt=true, the router assumes that a multi-statement query causes changes in the session state and proceeds to route all queries afterwards to only the master. In this mode the router session is effectively locked to the master server.

The routing of COM_STMT_PREPARE changed between 2.1 and 2.2 from being routed to the master to being routed to all server. This was done to allow load balancing of prepared statements so the behavior you observed would be expected.

Can you clarify what behavior you use strict_multi_stmt for and what you'd expect to see with it?

Generated at Thu Feb 08 04:10:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.