[MXS-17] bugzillaId-736: Memory leak while doing read/write splitting Created: 2015-01-19 Updated: 2015-06-17 Resolved: 2015-05-05 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | readwritesplit |
| Affects Version/s: | 1.0.4 |
| Fix Version/s: | 1.0.6, 1.1.0, 1.1.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Yves Trudeau | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Linux |
||
| Attachments: |
|
| Description |
|
This is imported from bugzilla item: http://bugs.mariadb.com/show_bug.cgi?id=736 Yves Trudeau 2015-02-19 16:33:04 UTC yves@yves-desktop:~/src/$ ( echo 'drop table if exists simpleinsert; create table simpleinsert (id int not null auto_increment, val tinyint not null, primary key (id)) engine=innodb;'; while true; do echo 'rollback;set autocommit=1;set autocommit=0;insert into simpleinsert (val) values (0);commit;'; sleep 0.1; done ) | mysql -h 10.3.1.120 -u tpcc -ptpcc test is sufficient. Below is the rate of increase of the RSS. root@radian6-maxscale-1:~/git/MaxScale/build# while true; do ps faxu | grep maxscale |grep -v grep; sleep 15; done Going from about 106MB to 134MB in 1min. The relevant parts of my config:
Something interesting is that the memory is reused, if I restart the insert script, the memory for about the time it took originally to reach the current memory usage and then it will rise again. Valgrind shows many entries like: ==8131== 25,906,560 bytes in 6,288 blocks are still reachable in loss record 160 of 162 Regards, Yves |
| Comments |
| Comment by Dipti Joshi (Inactive) [ 2015-03-10 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
This is the comment history imported from bugzilla: Comment 1 Yves Trudeau 2015-02-20 14:25:16 UTC So, this is not a bug although it should be documented in some way that it shouldn't be used with long running database sessions. The solutions will be to set a max connection age with the application. --Yves Comment 2 Markus Mäkelä 2015-02-20 14:42:26 UTC Comment 3 Timofey Turenko 2015-02-25 13:08:17 UTC documentation changed, closing Comment 4 Yves Trudeau 2015-02-28 20:51:18 UTC Comment 5 Markus Mäkelä 2015-03-01 00:31:41 UTC The parameter can be enabled by adding the 'max_sescmd_history' to the service's router options: router_options=max_sescmd_history=2500 Comment 6 Yves Trudeau 2015-03-03 15:59:09 UTC
It all seems to come from the embedded thread in the query_classifier, maybe some unfreed gwbuf_clone missing that the sbuf->refcount would prevent from being actually freed. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2015-03-19 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I added a Valgrind log of MaxScale with a Readwritesplit router limited to 100 session commands. I tested it by executing the following command:
The connection was closed after 100 session commands were issued and MaxScale was shut down through maxadmin. From this log I can't see a memory leak so I will mark bug as fixed. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yves Trudeau [ 2015-03-27 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
From your vg.log, what is this if not a leak... Loop over 200 and check, it will be twice as large. Closing the session has no impacts. You can also verify without valgrind, let it run for a few minutes, you'll see the RSS rising. --Yves ==30167== 32,736 bytes in 4 blocks are possibly lost in loss record 113 of 115 | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2015-03-29 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
There was a memory leak in the client protocol when an error message was improperly free'd. This has been fixed in the develop branch. Attached is a log from valgrind with 200 queries of set @var1 = if(@var1 = NULL,0,@var1 + 1); in the same session. The call to mysql_real_connect and mysql_init is done by the monitors and since the monitors currently do not disconnect when they are stopped, they do not explicitly free their memory. I've been running valgrind and I've yet to pinpoint a reason for the increased memory consumption of MaxScale. I did a test with 20000 commands and 100 iterations and MaxScale ended up with a memory usage of about 1200MB. The memory does not seem to grow linearly with the amount of queries executed. Increasing the amount of session commands for a session increases MaxScale's memory consumption. The amount does not drop down to normal levels after the sessions close so I think it might somehow relate to the embedded library. I'll continue the investigation and reopen the bug. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2015-03-30 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I've noticed some interesting behavior, MaxScale seems to use large amounts of memory after large amounts of session commands are issued but the memory usage drops down when further sessions connect and issue smaller sets of session commands. For example, I ran the previous comment's query 20000 times and ended up with the following results: UID PID PPID C SZ RSS PSR STIME TTY TIME CMD Next I ran 15 sessions of 200 queries and got the following results: UID PID PPID C SZ RSS PSR STIME TTY TIME CMD This is indeed curious behavior but it would suggest that memory might not actually leak but the embedded library somehow cached data that is not immediately purged. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2015-04-01 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
It seems I misunderstood the output and memory usage includes paged memory. So far I have not been able to find a memory leak while continuously issuing session commands. The overall memory usage stays around 800 Mb if I keep executing a loop of sessions issuing 20 queries of set @var1 = if(@var1 = NULL,0,@var1 + 1); forever. I will keep investigating for possible leaks. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yves Trudeau [ 2015-05-05 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The latest code release the memory correctly when the session close but, there is still an issue with long running connection that cannot be ignored. Here's an example using the latest code from the develop branch: root@maxscale-1:/usr/local/skysql/maxscale# LD_PRELOAD=/usr/lib/libjemalloc.so.1 ./bin/maxscale -c `pwd` -f ./etc/MaxScale.cnf -lfile 8430 root 20 0 478m 43m 4000 S 1 5.9 0:00.17 maxscale The RSS is going 43M just after start to 328M just before the session exit and then goes back to 44M (if using jemalloc). The database session was: yves@yves-desktop:/opt/src/MaxScale$ ( for i in `seq 1 10000`; do echo "set @test=$i;"; done ) | mysql -h 10.3.1.110 -utpcc -ptpcc tpcc All the set variables are in the same connection. That represents (328M-44M)/10000 ~ 30KB per set variable, not small. If I loop more, eventually memory can no longer be allocated and MaxScale crashes. 8446 root 20 0 1240m 588m 4344 R 79 79.3 1:10.35 maxscale Based on the message, that smells like the memory usage is coming from libmysqld. my router configuration is: [RW Split Router] | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yves Trudeau [ 2015-05-05 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Standby on this... found something wrong on my side. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by markus makela [ 2015-05-05 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The disable_sescmd_history and max_sescmd_history are router_option values and should be formatted like so: Sadly, this does not provide a smooth way to configure it. I'll look into adding them as regular parameters. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yves Trudeau [ 2015-05-05 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
All is good, the behavior is correct, this can be closed. I suggest you add an example of chaining router_options in the doc like: router_options=slave_selection_criteria=LEAST_CURRENT_OPERATIONS,disable_sescmd_history=true I happened to have 2 route_options lines and the disable_sescmd_history wasn't set. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timofey Turenko [ 2015-05-12 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
closing is confirmed by bug reporter. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-05-12 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
tturenko, markus makela | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timofey Turenko [ 2015-05-12 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
this bug is about memory leak and high memory consumption. disable_sescmd_history option was introduced as well as several code improvements were done. Now things are better. What kind of document do you need? | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-05-12 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
When we add new router option, we need to have example of how to use it in Router User Guide or Tutorial document. We also need to have our test case that uses this option and validates that it takes care of memory leak during regression. Dipti | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timofey Turenko [ 2015-05-13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Options are described in Configuration guide https://github.com/mariadb-corporation/MaxScale/blob/develop/Documentation/Getting-Started/Configuration-Guide.md#optional-parameters | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dipti Joshi (Inactive) [ 2015-05-13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
tturenko Where is the test cases that tests these options ? | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timofey Turenko [ 2015-05-15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
test for 'max_sescmd_history' and 'connection_timeout' options: https://github.com/mariadb-corporation/maxscale-system-test/blob/master/session_limits.cpp |