[MDEV-20455] memory leak in 10.4 series Created: 2019-08-30  Updated: 2021-10-07  Resolved: 2020-11-29

Status: Closed
Project: MariaDB Server
Component/s: Galera, Server
Affects Version/s: 10.4.6, 10.4.7, 10.4.8
Fix Version/s: 10.4.14

Type: Bug Priority: Critical
Reporter: Matthias Merz Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 6
Labels: Memory_leak, wsrep
Environment:

Docker image from docker-hub, 10.4.6-bionic 3-node galera setup on a debian 10 host.
Only one instance is receiving live queries, two are passive nodes (or used readonly for backups)
256GB host memory
InnoDB tables only


Attachments: PNG File Screen Shot 2021-03-29 at 4.59.41 PM.png     PNG File Screen Shot 2021-03-29 at 5.13.40 PM.png     PNG File chart2.png     PNG File galera-4months-swap.png     File jacob.cnf     PNG File swap-leak-10-4-13-1.png     PNG File swap-zoom-10-4-13-1.png    
Issue Links:
Relates
relates to MDEV-16431 merge 5.7 P_S memory instrumentation ... Closed
relates to MDEV-20218 galera: crash on restart after failed... Closed
relates to MDEV-26712 row events never reset thd->mem_root Closed

 Description   

MariaDB is configured for 64GB Innodb buffer_pool, which should lead to approx. 70-80GB of Memory consumption.

Over time, this increases, sometimes in larger steps, sometimes gradually. After 47h of "uptime" we are currently at:

top - 15:25:11 up 1 day, 22:48,  1 user,  load average: 3.28, 3.71, 3.77
Tasks: 299 total,   1 running, 296 sleeping,   0 stopped,   2 zombie
%Cpu(s): 10.0 us, 10.0 sy,  0.0 ni, 80.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257612.1 total,  84072.2 free, 163291.2 used,  10248.6 buff/cache
MiB Swap:   4768.0 total,   4768.0 free,      0.0 used.  92118.7 avail Mem 
 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                   
 2430 mysqld-docker  20   0  198.9g 163.5g   8.0g S 400.0  65.0  11415:05 mysqld 

this will in the end lead to an OOM condition in some days, but after OOM-kill, the galera IST will not work, triggering MDEV-20218 and in consequence break the whole cluster, because after some tries no donor is available and all nodes are DESYNC and writing replication logs to disk.

Unfortunately, MDEV-16431 seems not ready yet. How can I debug this?

MariaDB [my_db]> show global status like '%memory%';
+---------------------+-----------+
| Variable_name       | Value     |
+---------------------+-----------+
| Memory_used         | 906490496 |
| Memory_used_initial | 139670856 |
| Qcache_free_memory  | 0         |
+---------------------+-----------+
3 rows in set (0.002 sec)

BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 71135395840
Dictionary memory allocated 878832
Buffer pool size   4230336
Free buffers       1677346
Database pages     2430350
Old database pages 897500
Modified db pages  29090
Percent of dirty pages(LRU & free pages): 0.708
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 29835, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 2305059, created 125291, written 47846699
0.00 reads/s, 0.00 creates/s, 218.39 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2430350, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]

Please find the config file attached.

The memory consumtion is probably triggered by client access, because if we redirect our loadbalancer to the next backend, memory grows there. OTOH, memory usage won't decrease when not receiving queries, even after days. (Had to cut the experiment after 3 days, because node2/3 was threatening to break down)



 Comments   
Comment by Matthias Merz [ 2019-10-07 ]

I was also able to reproduce this memory leak without any galera plugin loaded, so the problem seems to come from MariaDB itself.
In the meantime, I also tested versions 10.4.7 and am currently running 10.4.8 without notable changes in memory profile.

Comment by dbteam [ 2020-07-02 ]

Does this happen with version MariaDB 10.4.13?

Comment by Matthias Merz [ 2020-07-02 ]

yes, still happens with 10.4.13, but unfortunately I still cannot reproduce it. Just happens in some rare access patterns, so probably some sort of concurrency issue (we have many short-lived connections, mixed with some larger transactions)

BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 71135395840
Dictionary memory allocated 921840
Buffer pool size 4231920
Free buffers 24585
Database pages 4021115
Old database pages 1483867
Modified db pages 40955
Percent of dirty pages(LRU & free pages): 1.012
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 42401637, not young 73201887
135.22 youngs/s, 0.50 non-youngs/s
Pages read 4681517, created 1817539, written 812815043
0.25 reads/s, 1.25 creates/s, 3441.39 writes/s
Buffer pool hit rate 999 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 4021115, unzip_LRU len: 0
I/O sum[2960712]:cur[0], unzip sum[0]:cur[0]

This seems not to match a RAM RSS of 140GB, or am I completely mistaken?

$ ps auxww | grep -e 29283 -e USER
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 26460 0.0 0.0 6408 892 pts/0 S+ 10:38 0:00 grep -e 29283 -e USER
docker-+ 29283 369 55.0 149067740 145335972 ? Ssl May28 184342:54 mysqld

Comment by Oleksandr Byelkin [ 2020-07-10 ]

are there some more specific info about queries and so on, now I do not see something to work with (how to repeat)?

Comment by Matthias Merz [ 2020-07-13 ]

That was my original question

I know, this is very hard to debug, as long as we cannot reproduce it. It seems to be triggered by some access pattern in our live workloads, nothing to reproduce on a test system.

That's why I had mainly asked about how to find out, in which allocation area the memory might be consumed. As you can see, the buffer pool looks OK, but there's a huge amount of RAM not showing up in the status variables which I had checked. So where in server status can that info be retrieved?

Comment by Mattias Bergvall [ 2020-07-29 ]

Hello. I have this issue with MariaDB-server-10.4.13-1.el7.centos.x86_64.
My setup is 5 vms running CentOS 7.2 with a galera cluster where all traffic is hitting 1 node.
I've been running MariaDB-Galera-server-10.0.22-1.el6.x86_64 for many years without any problems, but decided that it would be good to upgrade, which I did in may, two months ago, performing a rolling upgrade like this:

May 20 16:37:35 Updated: galera-25.3.29-1.rhel7.el7.centos.x86_64
May 20 16:37:38 Erased: MariaDB-Galera-server-10.0.22-1.el6.x86_64
May 20 16:37:43 Installed: jemalloc-3.6.0-1.el7.x86_64
May 20 16:38:02 Installed: MariaDB-server-10.1.45-1.el7.centos.x86_64
May 22 09:00:37 Installed: maxscale-2.4.9-1.x86_64
May 22 09:23:51 Erased: MariaDB-server-10.1.45-1.el7.centos.x86_64
May 22 09:24:17 Installed: MariaDB-server-10.2.32-1.el7.centos.x86_64
May 22 10:21:15 Erased: MariaDB-server-10.2.32-1.el7.centos.x86_64
May 22 10:21:37 Installed: MariaDB-server-10.3.23-1.el7.centos.x86_64
May 25 13:06:27 Erased: maxscale-2.4.9-1.x86_64
May 25 13:34:18 Erased: MariaDB-server-10.3.23-1.el7.centos.x86_64
May 25 13:34:27 Installed: galera-4-26.4.4-1.rhel7.el7.centos.x86_64
May 25 13:34:52 Installed: MariaDB-server-10.4.13-1.el7.centos.x86_64
May 25 13:34:52 Erased: galera-25.3.29-1.rhel7.el7.centos.x86_64

Same history except minor differences in the timings on all 5 nodes.

I'm not using MaxScale (the application didn't like it, since it sometimes did reads immediately after write and thus found inconsistencies).

What I've observed is that since May 25th/May 26th, all machines but one, are increasing their swap usage until there is none more available. And it is the nodes that don't get any query traffic, the ones that are just replicating the data that are "leaking".

Attaching a graph showing the swap usage in percent before and after the upgrade where the change in behavior is obvious:

And also supplying a 14-day view of the same metrics where it is obvious that the node that gets hit by queries (the green line) shows a different behavior:

This is the "top" output on the node that is "the master/active":

Tasks: 258 total,   4 running, 254 sleeping,   0 stopped,   0 zombie
%Cpu(s): 33.3 us, 28.7 sy,  0.0 ni, 33.7 id,  0.8 wa,  0.0 hi,  3.5 si,  0.0 st
KiB Mem :  5943044 total,   853060 free,  1777096 used,  3312888 buff/cache
KiB Swap:  6242300 total,  5289576 free,   952724 used.  3598580 avail Mem
 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
13584 mysql     20   0 4965396 1.356g 172360 S  42.2 23.9  13734:56 mysqld

And this is the same from one of the others:

Tasks: 242 total,   4 running, 237 sleeping,   0 stopped,   1 zombie
%Cpu(s): 11.2 us, 15.5 sy,  0.0 ni, 69.0 id,  0.9 wa,  0.0 hi,  3.4 si,  0.0 st
KiB Mem :  5943044 total,  1235528 free,  3182500 used,  1525016 buff/cache
KiB Swap:  6242300 total,  2882904 free,  3359396 used.  2219660 avail Mem
 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3079 mysql     20   0 7652640 2.786g 138004 S  30.6 49.2   1129:47 mysqld

Hope this helps!

Comment by Oleksandr Byelkin [ 2020-07-29 ]

Soory, but it does not help much. I have no doubts that the problem exists (most above are try to prove existance of the problem), I need something to localize the problem.

What engine you use, do you use replication, what kind of load is there and so on.

Status variables before, durimg and an the end of the period may help...

Comment by Mattias Bergvall [ 2020-07-29 ]

This is a 4-months grap of swap free in my 5-node cluster. The upgrade to 10.4 was made on may 25th-may26th. The spikes correspond to node reboots. The green line since most of june is the main node that has gotten all the traffic for the last 55+ days. All others have been rebooted regularly.

Comment by Mattias Bergvall [ 2020-07-29 ]

The storage engine in use is InnoDB only.
The load comes from an internal web application (99% SELECT) and a few other internal applications that perform perhaps 80% INSERT and 20% SELECT.

Snapshot from the "main" node:

MySQL uptime                            19 days, 02:47:43
MySQL update operations per second      4.4 qps
MySQL slow queries                      4374
MySQL select operations per second      229.94 qps
MySQL rollback operations per second    0.1322 qps
MySQL queries per second                244.69 qps
MySQL insert operations per second      1.5 qps
MySQL delete operations per second      2.04 qps
MySQL commit operations per second      0.68 qps
MySQL bytes sent per second             160.98 KBps
MySQL bytes received per second         47.27 KBps
MySQL begin operations per second       0.8135 qps

And from a "passive" node:

MySQL uptime                            5 days, 21:22:38
MySQL update operations per second      30.92 qps
MySQL slow queries                      11
MySQL select operations per second      0.7344 qps
MySQL rollback operations per second    0 qps
MySQL queries per second                4.73 qps
MySQL insert operations per second      3.29 qps
MySQL delete operations per second      3.63 qps
MySQL commit operations per second      0.1328 qps
MySQL bytes sent per second             468.78 Bps
MySQL bytes received per second         496.72 Bps
MySQL begin operations per second       0.1341 qps

/etc/my.cnf.d/server (empty lines and comments removed. Names and stuff have been replaced by "-----"):

[server]
tmpdir=/var/tmp
[mysqld]
bind-address=0.0.0.0
port=3306
[galera]
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_cluster_address="gcomm://------.20,------.21,------.22,------.23,------.24"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_on=ON
wsrep_cluster_name='------'
wsrep_node_address='---------------'
wsrep_node_name='beaker'
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:-----------------------
wsrep_provider_options="gcache.size=2G"
wsrep_slave_threads=1
innodb_flush_log_at_trx_commit=2
[embedded]
[mariadb]
[mariadb-10.0]

Comment by LuborJ [ 2020-09-07 ]

I have similar setup with same problem. We have only 10G innodb_buffer_pool_size on 128GB RAM hardware and MariaDB use all of memory every few days.

Comment by Mattias Bergvall [ 2020-11-25 ]

This issue seems to have been resolved in MariaDB-server-10.4.14-1.el7.centos.x86_64 �

Comment by Sergei Golubchik [ 2020-11-29 ]

Thanks. I'll close it then. Given that we don't have enough information to reproduce the issue reliably, let's just assume it was fixed, as you no longer experience it.

If anyone still has this, please do comment and we'll reopen the issue.

Comment by Andrew Bierbaum [ 2021-03-30 ]

We are seeing this same issue on 10.4.15 after a recent upgrade from an older version.

These are running on mariadb 10.4.15, bionic, installed via apt-get using puppet.

We have two large hosts that are nearly identical in their setup. One runs a heavy automated load and the other is used for more ad-hoc queries with both set to innodb_buffer_pool_size=22G with innodb_buffer_pool_instances = 22. The one with light ad-hoc queries is steady at 24 GB of ram usage. The one with heavy automated load is currently using 49 gb of ram, and slowly growing, see attached memory usage

Innodb does not seem to be aware of much of the ram usage.

----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 23655874560
Dictionary memory allocated 5751648
Buffer pool size   1412048
Free buffers       26
Database pages     1289404
Old database pages 475523
Modified db pages  2888
Percent of dirty pages(LRU & free pages): 0.224
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 596448458, not young 252946942881
314.17 youngs/s, 384.60 non-youngs/s
Pages read 2518392152, created 22370165, written 632581348
40.14 reads/s, 4.21 creates/s, 1191.78 writes/s
Buffer pool hit rate 999 / 1000, young-making rate 0 / 1000 not 1 / 1000
Pages read ahead 0.00/s, evicted without access 0.02/s, Random read ahead 0.00/s
LRU len: 1289404, unzip_LRU len: 112198
I/O sum[1357598]:cur[15136], unzip sum[35310]:cur[616]

digging through variables and status has also been unfruitful in diagnosing the memory usage.

mysql -e 'SHOW status like "%memory_used%"'
+---------------------+-----------+
| Variable_name       | Value     |
+---------------------+-----------+
| Memory_used         | 80032     |
| Memory_used_initial | 158008888 |
+---------------------+-----------+

Also, running "FLUSH TABLES" doesn't appear to effect memory usage.

We set the innodb_buffer_pool_size to a fairly low 22GB of the total 80 GB of memory on the host to try and slow and stabilize the issue. Memory usage may have started to level out, but is far beyond what was estimated using the innodb_buffer_pool_size=22GB. I can provide more details as it would be helpful, but I'm not sure what would help diagnose the issue.

We are seeing very similar behavior in our smaller and more easily changed test environments as well. On that host, clear memory steps are taken when timed & automated query load hits every 6 hours, with oom reapering happening about every 2 days as it's currently configured. that host uses innodb_buffer_pool_size = 1456M with innodb_buffer_pool_instances = 2 on 8 GB of host memory.

All these hosts are carrying out fairly heavy multi replication channel replication.

Thanks

Comment by Maurice Gasco [ 2021-04-06 ]

I'm having the same issue on 10.4.18, CentOS 8
I have 3 nodes with galera replication. The one that receives client connections has his memory growing permanently.
with these parameters :
max-connections = 3000
innodb_log_file_size=1024M
innodb_adaptive_hash_index=0
innodb_buffer_pool_size=40G
table_open_cache=10000

LimitNOFILE=200000
LimitMEMLOCK=200000

With 1200 active connections, Memory grows to 140Gb in less than 24 hours

Comment by Andrew Bierbaum [ 2021-04-30 ]

I'm currently testing on lower environments, but it appears that switching memory allocators may solve the issues for us. I had found a few mentions of tcmalloc on other tickets on this jira.mariadb.org as well as a few other sites. ( see https://jira.mariadb.org/browse/MDEV-21372?focusedCommentId=144092&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-144092 and https://dba.stackexchange.com/a/273834 )

sudo apt-get install google-perftools

place in /etc/systemd/system/mariadb.service.d/override.conf

[Service]
Environment="LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4"

then

sudo systemctl daemon-reload
sudo systemctl restart mysql

produces

MariaDB [(none)]> SHOW VARIABLES LIKE 'version_malloc_library';
+------------------------+-------------------------+
| Variable_name          | Value                   |
+------------------------+-------------------------+
| version_malloc_library | tcmalloc gperftools 2.5 |
+------------------------+-------------------------+
1 row in set (0.002 sec)

The memory use appears stable on Ubuntu 18.04 with either mariadb 10.4.15 and 10.5.9.

Comment by Eugene [ 2021-05-03 ]

Hit very same issue. mariadb 10.4.15 with galera-26.4.5 on Gentoo, linux kernel 5.10.31.

max_connections				= 2000
table_open_cache			= 120000
table_open_cache_instances              = 16
innodb_adaptive_hash_index		= 0
innodb_buffer_pool_size			= 64G
innodb_log_file_size			= 8G
 
max number of open files            10240000

Used memory on mysqld grew from 160 to 240GB (within two weeks) with system malloc library.

Comment by Roel Van de Paar [ 2021-09-29 ]

Can anyone seeing this issue confirm if they are NOT using replication?
For replication based setups, see MDEV-26712.

Comment by Matthias Merz [ 2021-09-29 ]

In my original report, I had also tried 10.4.6 without any replication, also observing memory consumption going up. At least then, this was not replication related. Also I see the mem growth only on the node(s) receiving SQL-requests, not on the replicated nodes without interactive sessions. So I'd assume the original bug to not be related to galera and at least (unfortunately) not MDEV-26712 in our case.

OTOH it's practically impossible in our environment to reproduce this behaviour. It feels like it's gotten better (currently we are in 10.4.21), but never went away completely. And also another "single" instance on Debian buster (version 10.3.29) currently takes 196GB Resident-Set with innodb_buffer_pool_size=64G (and no MyISAM or other table types in a mentionable amount)

Comment by Roel Van de Paar [ 2021-09-29 ]

Thank you mmerz for the input, much appreciated.

Comment by shield [ 2021-10-07 ]

Adding some information on the topic of NOT using replication.

Memory usage initially increase to what one expects - as assigned to the service. Then, over a time frame of 2 to 4 weeks, memory usage increases very slowly until completely depleted.

Configuration:
Around 215 servers used explicitly as database servers - MariaDB
mariadb-server 1:10.4.21+maria~stretch
Kernel: linux-image-4.19-amd64 (will be upgrading to Buster within the next few months).
Mixture of innodb and myisam tables.
The servers mostly host around 1800 web based databases, mostly Wordpress.

Most of these servers have 32G RAM with a swappiness of 1, with settings:
innodb_buffer_pool_size = 11159M
interactive_timeout = 1800
key_buffer_size = 3G
max_connections = 400
open_files_limit = 8000
query_cache_limit = 1M
query_cache_size = 0
query_cache_type = OFF
read_buffer_size = 2Mk
sort_buffer_size = 2M
table_open_cache = 2000
thread_cache_size = 800
thread_stack = 292K
wait_timeout = 1800

Calculating predicted RAM usage based on this query:
SELECT ROUND( ( @@GLOBAL.key_buffer_size + @@GLOBAL.query_cache_size + @@GLOBAL.tmp_table_size + @@GLOBAL.innodb_buffer_pool_size + @@GLOBAL.aria_pagecache_buffer_size + @@GLOBAL.innodb_log_buffer_size + @@GLOBAL.max_connections * ( @@GLOBAL.sort_buffer_size + @@GLOBAL.read_buffer_size + @@GLOBAL.read_rnd_buffer_size + @@GLOBAL.join_buffer_size + @@GLOBAL.thread_stack + @@GLOBAL.binlog_cache_size) ) / 1024 / 1024, 1) `total MB`;

...a server with 32G is configured to use 16422.6MB.

If you need more information, I'll gladly provide.

Generated at Thu Feb 08 08:59:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.