[MDEV-12205] MariaDB 10.1.21 Per-query timeout prefixes leak memory (confirmed) Created: 2017-03-08 Updated: 2017-06-01 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.1.21 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrew Gabbitas | Assignee: | Elena Stepanova |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | None | ||
| Environment: |
OS: CentOS 7 |
||
| Attachments: |
|
| Description |
|
I have a 720GB database, 808GB innodb_buffer_cache, 1TB system memory and 10 tables. For all I can figure we shouldn’t be running out of memory, however, the mysqld process steadily increases in memory consumption until OOM killer rightfully claims it (we now restart it before that happens). After finding some relevant memory leak fixes released in 10.1.21 we upgraded to that from 10.1.13, but unfortunately it didn’t help. Running mysqltuner.pl (https://github.com/major/MySQLTuner-perl) seems to agree that we shouldn’t be using more than 809 GB.
At the time this data was gathered, the mysqld process was at 980GB. (gathered with `ps -eo pid,comm,rss`), leaving the available system memory at 17GB. I can consistently reproduce this with similar workload on a scaled down test instance. I have attached a binlog from my test instance, show variables, and a few grafana graphs. Thanks for taking a look! |
| Comments |
| Comment by Daniel Black [ 2017-03-09 ] | |
|
Disclaimer: this is a rough concept only that require testing in a non-production environment obviously. I'd be looking at something like this to use gdb to attach to the running process and print out information http://www.outofcore.com/2011/06/scripted-debug-using-gdb/. So trapped points would be (from source mysys/my_malloc.c):
Ideally logging a returned (using ``print pointer, my_flags, size``) pointer, flags and size from malloc/realloc calls, the oldptr from the realloc call, and the ptr argument from the free call. At each of these breakpoints logging the backtrace of the current thread. Hopefully the output from this with a bit of analysis can identify the backtrack where memory is allocated by not freed. gdb can attach to a process (using -p) and there is a batch mode for the script. I feel your pain about hard to debug memory leaks (mine was my second bug report PS. If you do write the gdb script please include if for someone else | |
| Comment by Sergei Golubchik [ 2017-03-09 ] | |
|
Did you look at `MEMORY_USED` values in `INFORMATION_SCHEMA.PROCESSLIST` and `SHOW STATUS`? Anything notable? | |
| Comment by Andrew Gabbitas [ 2017-03-10 ] | |
|
Thanks for the replies and info! `select MEMORY_USED from INFORMATION_SCHEMA.PROCESSLIST;` looks fine. I took that a step further and shut off the application in my test environment and confirmed no connections to the db other than my session (MEMORY_USED confirmed to no longer show those processes). The memory usage stops getting worse with the db idle (expected), but doesn’t release any memory either. The mysqld process still shows it is still using much more memory than it should be when taking into account relevant cache and buffer pool values (innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + query_cache_size). I have left my test off for several hours now, to confirm mysqld memory use stays the same. | |
| Comment by Daniel Black [ 2017-03-12 ] | |
|
Could this be the same as | |
| Comment by Elena Stepanova [ 2017-04-20 ] | |
|
Is it a master or a slave? File names in attachment suggest that it could be a master, but it has rpl-do-db and a big gtid_slave_pos, so it's unclear. | |
| Comment by Andrew Gabbitas [ 2017-05-01 ] | |
|
It is only the master that has the problem. We have been able to narrow down where the problem resides. We are using per query timeouts. Here is an example from the attached binlog.
We made a modification our app to remove the prefix as a test, and we could no longer reproduce the memory leak. Changing the timeouts from per query to per session does dramatically slow the memory leak, but doesn't look to resolve it completely. |