[MXS-223] Possible memory leak in gwbuf_alloc Created: 2015-06-25  Updated: 2015-07-16  Resolved: 2015-07-16

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 1.1.1
Fix Version/s: 1.2.0

Type: Bug Priority: Major
Reporter: Krzysztof Książek Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MaxScale.cnf     File report.tar.gz    
Issue Links:
PartOf
includes MXS-266 pt-log-player fails with readwritesplit Closed

 Description   

Hello,

I've experienced behavior which points to the memory leak in MaxScale 1.1.1.

How to repeat:

Decompress attached tarball, download and install pt-log-player from Percona Toolkit 2.1.11
Open a screen with two sessions.
In the first session run:
while true ; do ps auxf | grep pt-log-player | awk '

{print $2}

' | xargs kill -9 ; sleep 4s ; done
In the second session run:
while true ; do /root/percona-toolkit-2.1.11/bin/pt-log-player --play ./sessions --base-dir ./results h=your_IP,P=your_RWsplit_port,u=youruser,p=yourpass ; done

After a while observe memory utilization growing:
root@ip-10-10-92-43:~/massif# ps auxf | grep maxscale
root 14056 0.6 48.3 2554104 1955360 ? Ssl 13:37 0:14 /usr/local/mariadb-maxscale/bin/maxscale

Session files contain:

root@ip-10-10-90-60:~/maxscale_debug# cat ./sessions/sessions-* | sort | uniq -c
488571 SET AUTOCOMMIT=0
486362 SET AUTOCOMMIT=1
1973024 SET NAMES utf8

MaxScale version:
root@ip-10-10-92-43:~/massif# /usr/local/mariadb-maxscale/bin/maxscale --version
1.1.1

I've attached tarball with session files and massif report. For this test I've used single PXC instance in a readwritesplit router. readconnroute doesn't seem to be affected.



 Comments   
Comment by Krzysztof Książek [ 2015-06-25 ]

And here's the relevant output from ms_print:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 78 22,909,970,225    3,972,350,664    3,970,904,683     1,445,981            0
99.96% (3,970,904,683B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->90.57% (3,597,785,085B) 0x546663: gwbuf_alloc (buffer.c:92)
| ->90.40% (3,590,928,779B) 0x1DA53EE5: gw_MySQL_get_next_packet (mysql_common.c:1725)
| | ->90.40% (3,590,928,779B) 0x1DA50E17: route_by_statement (mysql_client.c:1574)
| |   ->90.40% (3,590,928,779B) 0x1DA4FB6C: gw_read_client_event (mysql_client.c:867)
| |     ->90.40% (3,590,928,779B) 0x55A003: process_pollq (poll.c:869)
| |       ->90.40% (3,590,928,779B) 0x5598B4: poll_waitevents (poll.c:609)
| |         ->90.40% (3,590,928,779B) 0x54A35D: main (gateway.c:1861)
| |
| ->00.17% (6,856,306B) in 1+ places, all below ms_print's threshold (01.00%)
|
->08.54% (339,284,817B) 0x7A68B0: my_malloc (my_malloc.c:41)

Comment by markus makela [ 2015-06-26 ]

Are the client sessions to MaxScale ever closed? If not, then this is a case of "session commands" being stored inside MaxScale.

You can disable this behavior by adding disable_sescmd_history=true to the router_options parameter of the service. This will disable the storing of queries that modify the state of the session. Here is the relevant documentation: https://github.com/mariadb-corporation/MaxScale/blob/master/Documentation/Getting-Started/Configuration-Guide.md#optional-parameters

If possible, please retest this with the session command history disabled.

Comment by Krzysztof Książek [ 2015-06-26 ]

I'll check but I'm pretty sure app's not handling session termination correctly. Like, I'd say, 90% of the apps out there... The connection is terminated, though, and I'd expect proxy to clean up the mess the app created. I assume you can track which connections/sessions are open and which were closed. Maybe there is a configurable timeout to force cleansing? Other option will be to change defaults of max_sescmd_history (if I understand correctly it should keep the memory utilization within limits).

Comment by markus makela [ 2015-06-26 ]

Yes, this seems to be the case with most of the applications. A large part of them seem to use pooled connections and MaxScale will see them as one long session.

MaxScale can be configured to work better with pooled connections. The max_sescmd_history option closes sessions that exceed the set limit. Using this will force the sessions to be recreated which will cause a slight performance drop due to the closing and opening of the session. This is why the disable_sescmd_history option exists.

Other ways to enforce limits on sessions is to set the connection_timeout=<seconds> parameter for the service. This will close all sessions which have been idle for too long.

Comment by Krzysztof Książek [ 2015-07-03 ]

Markus, just to confirm I'm setting the option correctly. Does this setup of a rwsplit router looks fine to you?

[RW]
type=service
router=readwritesplit
servers=10.10.92.198
user=maxmonit
passwd=3590E771B21BC31B9112404385FDD66A
max_slave_connections=100%
router_options=disable_sescmd_history=true

Actually, I've tested setting the disable_sescmd_history=true also outside of the router_options. None of it prevented MaxScale from increasing it's memory footprint.

Comment by markus makela [ 2015-07-03 ]

The config looks good, it should prevent the storing of session commands.

Did you install MaxScale from a package or build it from source?

Comment by Krzysztof Książek [ 2015-07-03 ]

I've installed if from the repo:

root@ip-10-10-92-43:~# cat /etc/apt/sources.list.d/maxscale.list
deb http://code.mariadb.com/mariadb-maxscale/1.1/repo/ubuntu trusty main

Comment by markus makela [ 2015-07-07 ]

I've reproduced one memory leak but it was caused when the pt-log-player failed to execute a query:

Found 2 session files.
--threads is greater than the number of session files.  Only 2 concurrent process will be ran at /usr/bin/pt-log-player line 2774.
Error getting the current SQL_MODE: DBD::mysql::db selectrow_array failed: fetch() without execute() [for Statement "SELECT @@SQL_MODE"] at /usr/bin/pt-log-player line 2278.

This has been fixed but I'll still continue testing for other possible memory leaks.

Generated at Thu Feb 08 03:57:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.