[MDEV-12205] MariaDB 10.1.21 Per-query timeout prefixes leak memory (confirmed) Created: 2017-03-08  Updated: 2017-06-01

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.1.21
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Andrew Gabbitas Assignee: Elena Stepanova
Resolution: Unresolved Votes: 2
Labels: None
Environment:

OS: CentOS 7
Physical/Virtual: Can reproduce on both vm and hardware, but primarily on physical. 1TB memory, dual 12x core CPU.


Attachments: File binlog.000099.gz     PNG File master_db_memory.png     PNG File master_db_mysqld_memory.png     File show_variables.csv    

 Description   

I have a 720GB database, 808GB innodb_buffer_cache, 1TB system memory and 10 tables. For all I can figure we shouldn’t be running out of memory, however, the mysqld process steadily increases in memory consumption until OOM killer rightfully claims it (we now restart it before that happens). After finding some relevant memory leak fixes released in 10.1.21 we upgraded to that from 10.1.13, but unfortunately it didn’t help.

Running mysqltuner.pl (https://github.com/major/MySQLTuner-perl) seems to agree that we shouldn’t be using more than 809 GB.

mysqltuner snippet

[--] Total buffers: 806.4G global + 2.9M per thread (1000 max threads)
[OK] Maximum reached memory usage: 808.2G (80.21% of installed RAM)
[OK] Maximum possible memory usage: 809.2G (80.31% of installed RAM)

At the time this data was gathered, the mysqld process was at 980GB. (gathered with `ps -eo pid,comm,rss`), leaving the available system memory at 17GB.

I can consistently reproduce this with similar workload on a scaled down test instance.

I have attached a binlog from my test instance, show variables, and a few grafana graphs.

Thanks for taking a look!



 Comments   
Comment by Daniel Black [ 2017-03-09 ]

Disclaimer: this is a rough concept only that require testing in a non-production environment obviously.
Disclaimer 2: This is general advice from a community user and I also don't have much time but feel your pain enough to offer this on an as is basis.

I'd be looking at something like this to use gdb to attach to the running process and print out information http://www.outofcore.com/2011/06/scripted-debug-using-gdb/.

So trapped points would be (from source mysys/my_malloc.c):
https://github.com/MariaDB/server/blob/a14638581b4c8ef175e68dccff07967d819b3b7e/mysys/my_malloc.c#L130

  1. my_free
  2. my_realloc
  3. my_malloc

Ideally logging a returned (using ``print pointer, my_flags, size``) pointer, flags and size from malloc/realloc calls, the oldptr from the realloc call, and the ptr argument from the free call. At each of these breakpoints logging the backtrace of the current thread. Hopefully the output from this with a bit of analysis can identify the backtrack where memory is allocated by not freed.

gdb can attach to a process (using -p) and there is a batch mode for the script.

I feel your pain about hard to debug memory leaks (mine was my second bug report MDEV-4974). Wish I'd though of this at that time (but I didn't know enough then).

PS. If you do write the gdb script please include if for someone else

Comment by Sergei Golubchik [ 2017-03-09 ]

Did you look at `MEMORY_USED` values in `INFORMATION_SCHEMA.PROCESSLIST` and `SHOW STATUS`? Anything notable?

Comment by Andrew Gabbitas [ 2017-03-10 ]

Thanks for the replies and info!

`select MEMORY_USED from INFORMATION_SCHEMA.PROCESSLIST;` looks fine. I took that a step further and shut off the application in my test environment and confirmed no connections to the db other than my session (MEMORY_USED confirmed to no longer show those processes). The memory usage stops getting worse with the db idle (expected), but doesn’t release any memory either. The mysqld process still shows it is still using much more memory than it should be when taking into account relevant cache and buffer pool values (innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + query_cache_size). I have left my test off for several hours now, to confirm mysqld memory use stays the same.

Comment by Daniel Black [ 2017-03-12 ]

Could this be the same as MDEV-11883?
Edit: never mind - its probably not. that has to do with read only filesystems.

Comment by Elena Stepanova [ 2017-04-20 ]

Is it a master or a slave? File names in attachment suggest that it could be a master, but it has rpl-do-db and a big gtid_slave_pos, so it's unclear.

Comment by Andrew Gabbitas [ 2017-05-01 ]

It is only the master that has the problem.

We have been able to narrow down where the problem resides.

We are using per query timeouts. Here is an example from the attached binlog.

SET STATEMENT max_statement_time=4.655, long_query_time=3.9200000000000004, sql_mode='STRICT_ALL_TABLES' FOR ....

We made a modification our app to remove the prefix as a test, and we could no longer reproduce the memory leak. Changing the timeouts from per query to per session does dramatically slow the memory leak, but doesn't look to resolve it completely.

Generated at Thu Feb 08 07:55:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.