[MDEV-12205] MariaDB 10.1.21 Per-query timeout prefixes leak memory (confirmed) - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.1.21
Fix Version/s: None
Component/s: None
Labels:
None
Environment:
OS: CentOS 7
Physical/Virtual: Can reproduce on both vm and hardware, but primarily on physical. 1TB memory, dual 12x core CPU.

Description

I have a 720GB database, 808GB innodb_buffer_cache, 1TB system memory and 10 tables. For all I can figure we shouldn’t be running out of memory, however, the mysqld process steadily increases in memory consumption until OOM killer rightfully claims it (we now restart it before that happens). After finding some relevant memory leak fixes released in 10.1.21 we upgraded to that from 10.1.13, but unfortunately it didn’t help.

Running mysqltuner.pl (https://github.com/major/MySQLTuner-perl) seems to agree that we shouldn’t be using more than 809 GB.

mysqltuner snippet
[--] Total buffers: 806.4G global + 2.9M per thread (1000 max threads)
[OK] Maximum reached memory usage: 808.2G (80.21% of installed RAM)
[OK] Maximum possible memory usage: 809.2G (80.31% of installed RAM)

At the time this data was gathered, the mysqld process was at 980GB. (gathered with `ps -eo pid,comm,rss`), leaving the available system memory at 17GB.

I can consistently reproduce this with similar workload on a scaled down test instance.

I have attached a binlog from my test instance, show variables, and a few grafana graphs.

Thanks for taking a look!

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

binlog.000099.gz
6.30 MB
2017-03-08 20:10
master_db_memory.png
42 kB
2017-03-08 06:56
master_db_mysqld_memory.png
48 kB
2017-03-08 06:56
show_variables.csv
17 kB
2017-03-08 06:56

Activity

Ascending order - Click to sort in descending order

Daniel Black added a comment - 2017-03-09 09:43

Disclaimer: this is a rough concept only that require testing in a non-production environment obviously.
Disclaimer 2: This is general advice from a community user and I also don't have much time but feel your pain enough to offer this on an as is basis.

I'd be looking at something like this to use gdb to attach to the running process and print out information http://www.outofcore.com/2011/06/scripted-debug-using-gdb/.

So trapped points would be (from source mysys/my_malloc.c):
https://github.com/MariaDB/server/blob/a14638581b4c8ef175e68dccff07967d819b3b7e/mysys/my_malloc.c#L130

my_free
my_realloc
my_malloc

Ideally logging a returned (using ``print pointer, my_flags, size``) pointer, flags and size from malloc/realloc calls, the oldptr from the realloc call, and the ptr argument from the free call. At each of these breakpoints logging the backtrace of the current thread. Hopefully the output from this with a bit of analysis can identify the backtrack where memory is allocated by not freed.

gdb can attach to a process (using -p) and there is a batch mode for the script.

I feel your pain about hard to debug memory leaks (mine was my second bug report ~~MDEV-4974~~). Wish I'd though of this at that time (but I didn't know enough then).

PS. If you do write the gdb script please include if for someone else

Daniel Black added a comment - 2017-03-09 09:43 Disclaimer: this is a rough concept only that require testing in a non-production environment obviously. Disclaimer 2: This is general advice from a community user and I also don't have much time but feel your pain enough to offer this on an as is basis. I'd be looking at something like this to use gdb to attach to the running process and print out information http://www.outofcore.com/2011/06/scripted-debug-using-gdb/ . So trapped points would be (from source mysys/my_malloc.c): https://github.com/MariaDB/server/blob/a14638581b4c8ef175e68dccff07967d819b3b7e/mysys/my_malloc.c#L130 my_free my_realloc my_malloc Ideally logging a returned (using ``print pointer, my_flags, size``) pointer, flags and size from malloc/realloc calls, the oldptr from the realloc call, and the ptr argument from the free call. At each of these breakpoints logging the backtrace of the current thread. Hopefully the output from this with a bit of analysis can identify the backtrack where memory is allocated by not freed. gdb can attach to a process (using -p) and there is a batch mode for the script. I feel your pain about hard to debug memory leaks (mine was my second bug report MDEV-4974 ). Wish I'd though of this at that time (but I didn't know enough then). PS. If you do write the gdb script please include if for someone else

Sergei Golubchik added a comment - 2017-03-09 18:51 - edited

Did you look at `MEMORY_USED` values in `INFORMATION_SCHEMA.PROCESSLIST` and `SHOW STATUS`? Anything notable?

Sergei Golubchik added a comment - 2017-03-09 18:51 - edited Did you look at `MEMORY_USED` values in `INFORMATION_SCHEMA.PROCESSLIST` and `SHOW STATUS`? Anything notable?

Andrew Gabbitas added a comment - 2017-03-10 02:10

Thanks for the replies and info!

`select MEMORY_USED from INFORMATION_SCHEMA.PROCESSLIST;` looks fine. I took that a step further and shut off the application in my test environment and confirmed no connections to the db other than my session (MEMORY_USED confirmed to no longer show those processes). The memory usage stops getting worse with the db idle (expected), but doesn’t release any memory either. The mysqld process still shows it is still using much more memory than it should be when taking into account relevant cache and buffer pool values (innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + query_cache_size). I have left my test off for several hours now, to confirm mysqld memory use stays the same.

Andrew Gabbitas added a comment - 2017-03-10 02:10 Thanks for the replies and info! `select MEMORY_USED from INFORMATION_SCHEMA.PROCESSLIST;` looks fine. I took that a step further and shut off the application in my test environment and confirmed no connections to the db other than my session (MEMORY_USED confirmed to no longer show those processes). The memory usage stops getting worse with the db idle (expected), but doesn’t release any memory either. The mysqld process still shows it is still using much more memory than it should be when taking into account relevant cache and buffer pool values (innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + query_cache_size). I have left my test off for several hours now, to confirm mysqld memory use stays the same.

Daniel Black added a comment - 2017-03-12 23:32 - edited

Could this be the same as ~~MDEV-11883~~?
Edit: never mind - its probably not. that has to do with read only filesystems.

Daniel Black added a comment - 2017-03-12 23:32 - edited Could this be the same as MDEV-11883 ? Edit: never mind - its probably not. that has to do with read only filesystems.

Elena Stepanova added a comment - 2017-04-20 14:12

Is it a master or a slave? File names in attachment suggest that it could be a master, but it has rpl-do-db and a big gtid_slave_pos, so it's unclear.

Elena Stepanova added a comment - 2017-04-20 14:12 Is it a master or a slave? File names in attachment suggest that it could be a master, but it has rpl-do-db and a big gtid_slave_pos, so it's unclear.

Andrew Gabbitas added a comment - 2017-05-01 21:10 - edited

It is only the master that has the problem.

We have been able to narrow down where the problem resides.

We are using per query timeouts. Here is an example from the attached binlog.

SET STATEMENT max_statement_time=4.655, long_query_time=3.9200000000000004, sql_mode='STRICT_ALL_TABLES' FOR ....

We made a modification our app to remove the prefix as a test, and we could no longer reproduce the memory leak. Changing the timeouts from per query to per session does dramatically slow the memory leak, but doesn't look to resolve it completely.

Andrew Gabbitas added a comment - 2017-05-01 21:10 - edited It is only the master that has the problem. We have been able to narrow down where the problem resides. We are using per query timeouts. Here is an example from the attached binlog. SET STATEMENT max_statement_time=4.655, long_query_time=3.9200000000000004, sql_mode='STRICT_ALL_TABLES' FOR .... We made a modification our app to remove the prefix as a test, and we could no longer reproduce the memory leak. Changing the timeouts from per query to per session does dramatically slow the memory leak, but doesn't look to resolve it completely.

People

Assignee:: Elena Stepanova

Reporter:: Andrew Gabbitas

Votes:: 2 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2017-03-08 07:00

Updated:: 2017-06-01 11:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration