[MDEV-15344] Huge memory usage on Maria 10.2.x cPanel Created: 2018-02-18  Updated: 2021-01-11

Status: Open
Project: MariaDB Server
Component/s: Debug
Affects Version/s: 10.2.13
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Neso Assignee: Unassigned
Resolution: Unresolved Votes: 11
Labels: None
Environment:

Centos 7.4 with CloudLinux - cPanel normal setup


Attachments: Zip Archive before_after_oom_kill_maps_lsof_status.zip     PNG File mariadb10-2-memory.png     File my.cnf     Text File mysqld_oom_killer_messages.txt     PNG File screenshot-1.png    
Issue Links:
Relates
relates to MDEV-18866 Chewing Through Swap Open

 Description   

Hi, after we upgraded our servers to use latest 10.2 version it started to use all RAM in very short time on some servers, on older relese on same server there was never problems, even servers with 256GB of RAM have problems now and we never expected this. For example we moved server from 64 to 128GB of RAM and now RAM usage is over 100GB where on 64 it was about 50% with all same sites, same setup, only diffrent release of Maria.

Only what I found similar is this: MDEV-13403

Any help advice what to do, because, we need to restart MySQL on same servers every several days because of this.

In attachment is my.cnf file
cPanel say it is related to MariaDB and that they can not help.

Again this is normal cPanel setup, on CentOS 7.4 and all memory problems started after upgarde to 10.2 and we never have this kind of problems with any older version of Maria

Any help would be great.



 Comments   
Comment by Elena Stepanova [ 2018-02-18 ]

Which version did you upgrade from?
You said that you are having the problem on some servers – what is the difference between servers which are affected and servers which are not?
What is "very short time" in "use all RAM in very short time" – is it seconds, minutes, hours, days?
Does it happen gradually from the server startup, or suddenly on a particular query?
What workflow is being executed on the server when the problem occurs?
Which engines are you using? You have pointed at a TokuDB bug report, are you using TokuDB? There are no signs of it in the config file. Also, the config file sets default engine to MyISAM, but it also has some InnoDB options. Do you use InnoDB at all, and if you do, then how heavily?
Does the server eventually get killed by OOM, or does it continue operating at high level of memory usage?
Can you get all threads' stack trace from the running server when it starts using up the memory?
Would it be possible for one of MariaDB developers to get access to a machine where the problem occurs?

Comment by Neso [ 2018-02-18 ]

Which version did you upgrade from? - From 10.1x release
You said that you are having the problem on some servers – what is the difference between servers which are affected and servers which are not? - All servers are same setup, some have more sites, some note, but in general all servers have huge memory usage on MariaDB 10.2 release
What is "very short time" in "use all RAM in very short time" – is it seconds, minutes, hours, days? - Some of servers go down, with 256GB RAM in several days
Does it happen gradually from the server startup, or suddenly on a particular query? - It constantly increase RAM usage and after several days there is no more free RAM.
What workflow is being executed on the server when the problem occurs? - It is dedicated server, mostly with about 150+ cPanel accounts, again this was not problem on old relese 10.1.x , regardless of amount of cPanel accounts.
Which engines are you using? You have pointed at a TokuDB bug report, are you using TokuDB? There are no signs of it in the config file. Also, the config file sets default engine to MyISAM, but it also has some InnoDB options. Do you use InnoDB at all, and if you do, then how heavily? - TokuDB was just a refference, as look similar problem,we do not use it, DBs are random, we have InnoDB and MyISAM, I do not know how much is usage.
Does the server eventually get killed by OOM, or does it continue operating at high level of memory usage? - Server is working as much it can, it then start using SWAP and at some point we need to to hard restart if we can not access anymore to any service.
Can you get all threads' stack trace from the running server when it starts using up the memory? - Yes, i think.
Would it be possible for one of MariaDB developers to get access to a machine where the problem occurs? - Yes, we would provide access.

Comment by Elena Stepanova [ 2018-02-18 ]

Thanks for the answers.

I assume you have run mysql_upgrade after upgrading from 10.1 to 10.2?

By workflow on the server I meant MariaDB server – that is, what kind of statements are being executed, are those mostly writes or reads, InnoDB or MyISAM, direct or prepared statements or stored procedures etc. Usually when a regression occurs, it's bound to some particular kind(s) of SQL. But this question is mainly relevant when we are talking about really short periods of time, minutes, hours at most. Of course, if it's days, there can be anything.

svoj, would you be able to work with Neso to get access to a box and see what's happening there?

Comment by Neso [ 2018-02-18 ]

Yes, upgrade is done over cPanel standard upgrade process from 10.1x to 10.2.x We did try to do some tweeks to reduce memory usage on some servers and it did help, they work most of time without need to restart, but still usage is extremely high, regardless how much server has memory.

Here screenshot from Munin: https://s3.amazonaws.com/upload.screenshot.co/e14a4a767c

Tnx for help.

Comment by Eri R Bastos [ 2018-04-17 ]

We are seeing something similar.

Here is an example of a single physical server running 6 instances of MariaDB. The first 5 are 10.2 and the last one is 10.1

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
127083 mysql     20   0 12.819g 8.973g   4832 S   8.3 14.3   1760:35 mysqld
126517 mysql     20   0 12.737g 8.940g   6260 S  10.6 14.3   1708:11 mysqld
127645 mysql     20   0 12.687g 8.867g   5096 S   7.0 14.2   1695:44 mysqld
126800 mysql     20   0 12.427g 8.614g   5748 S   8.3 13.8   1648:09 mysqld
127364 mysql     20   0 12.332g 8.502g   5212 S  14.6 13.6   1625:10 mysqld
126230 mysql     20   0 3110396 527080   6032 S   2.0  0.8 384:57.61 mysqld

In our case we are using MYISAM as the engine and we have about 2 million tables per instance. We need to restart the daemon every few days to keep them from OOM.

I am trying to create a synthetic way to reproduce the problem for a few days now, but no luck yet.

Not sure if this is related, but looks like 10.2 moved away from jemalloc: MDEV-11059

Side note: We do not use cPanel

Comment by Tom Parrott [ 2018-07-02 ]

I too am experiencing the same issues on CentOS 6.9 and MariaDB 10.2.15 and 10.2.16

Comment by Tom Parrott [ 2018-07-02 ]

Looks like its worse on servers with lots of tables...perhaps an innodb data dictionary leak?

Comment by Tom Parrott [ 2018-07-02 ]

I'm trying it with jemalloc to see if it helps: MDEV-11059

Comment by Steven Irwin [ 2018-07-29 ]

Same problem here. Mine is a GoDaddy VPS on Centos 6 with around 10 WordPress sites.
Over about a day the memory runs out and I can't access the sites.
A Graceful reboot restores.
This is a big problem because the VPS does not have swap so it needs contant monitoring.
Server has 8G of RAM.
When rebooted there is around 60% available and after about 12 hours there is 5% available.

Comment by Tom Parrott [ 2018-07-29 ]

Installing jemalloc from the MariaDB repo, and then adding this to server.cnf fixed it:

[mysqld_safe]
#More efficient memory allocator
malloc-lib=jemalloc

Comment by Steven Irwin [ 2018-07-29 ]

Unfortunately didn't fix it for me.

Comment by Tom Parrott [ 2018-07-31 ]

Could you post the output of free -m both after initially restarting and just before it uses all memory.

When I had problems before I found that the cached memory was very low, whereas switching to jemalloc improved the situation.

Comment by Nikolas Hermann [ 2018-07-31 ]

I Had the same problem after upgrading from 10.1.x to 10.2.x. (Galera, InnoDB only)
Switching back to jemalloc totally fixed it for me.

@Steven Irwin: Could you verify that jemalloc is being used?

show variables like 'version_malloc_library';

Variable_name Value
version_malloc_library jemalloc 3.6.0-0-g46c0af68bd248b04df75e4f92d5fb804c3d75340

If it is not, there is probally a init-system ignoring your [mysqld_safe]-options (systemd...), check the mariadb-service-convert script.

Comment by Steven Irwin [ 2018-07-31 ]

Startup log:

Startup Log
Starting MariaDB.180731 10:55:38 mysqld_safe Adding '/usr/lib64/libjemalloc.so.1' to LD_PRELOAD for mysqld
180731 10:55:38 mysqld_safe Logging to '/var/lib/mysql/new_error.log'.
180731 10:55:38 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

This is a production web server with moderate traffic levels. The memory leak is slower since I removed clamav and spamassassin services as I noticed both were faulting as memory decreased.

Being a production server it is not easy to debug. I did notice that the log_warnings was set to 2 by default after the upgrade and there were quite a few aborted connections. Setting the log level to 1 stopped these (as suggested by another post - suspect they are still happening but as this is a VPS with database colocated I am not sure why). I will keep looking in logs and testing.

Comment by Steven Irwin [ 2018-07-31 ]

This is free -m after reboot:

total used free shared buffers cached
Mem: 8192 1626 6565 95 0 472
-/+ buffers/cache: 1153 7038
Swap: 0 0 0

I will post when memory depleted

Comment by Evgenij [ 2018-09-06 ]

Hello.

I ran into the same problem. I have a server based on CloudLinux 6 (this is CentOS 6), I switched from MySQL 5.6 to MariaDB 10.2.17, and then noticed an increased consumption of RAM.

I found a fairly simple way to reproduce the problem. If the size of the database exceeds the amount of RAM, then it is enough to run the mysqltuner.pl script 3-5 times and the amount of memory consumed becomes much larger than specified in the my.cnf limits.

For example, the output mysqltuner.pl looks like this:
-------- Storage Engine Statistics --------------------------------------- --------------------------
[-] Data in MyISAM tables: 284.6M (Tables: 8400)
[-] Data in InnoDB tables: 2.1G (Tables: 9416)
[-] Data in MEMORY tables: 0B (Tables: 62)
-------- Performance Metrics ---------------------------------------- -------------------------------
[-] Up for: 2m 34s (7K q [47.175 qps], 696 conn, TX: 23M, RX: 18M)
[-] Physical Memory: 7.7G
[-] Max MySQL memory: 562.8M
[-] Other process memory: 2.2G
[-] Total buffers: 184.0M global + 2.9M per thread (100 max threads)

However, if you run the script several times, the consumption of RAM increases to 1.1 gigabytes (with a limit of 562 megabytes).

If you set the malloc-lib parameter to jemalloc, then the memory consumption does not exceed 300-400 megabytes, even if I run the tests more than 10 times.

I hope that this information can help you. If you need any additional information, please let us know.

Comment by Dmitri [ 2018-12-11 ]

same problem, when will be resolved?

Comment by YURII KANTONISTOV [ 2018-12-15 ]

Reproducible on the Centos 7.4+MariaDB 10.2.19 lab server, same recipe as in Evgenij's comment:

> I found a fairly simple way to reproduce the problem. If the size of the database exceeds the amount of RAM, then it is
> enough to run the mysqltuner.pl script 3-5 times and the amount of memory consumed becomes much larger than
> specified in the my.cnf limits.

/etc/my.cnf:
innodb_buffer_pool_size = 19327352832

After few mysqltuner.pl runs like this:

-bash-4.2$ perl mysqltuner.pl -host localhost -user mysql -pass xxxx

===>
mysqld virtual memory consumption is 3x times bigger than the memory MariaDB instance itself aware of:
===

top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3190 mysql 20 0 59.0g 21.9g 1896 S 3.0 93.5 242:32.73 mysqld

-bash-4.2$ free -gh
total used free shared buff/cache available
Mem: 23G 22G 171M 44M 506M 149M
Swap: 23G 22G 853M

MariaDB [performance_schema]> SELECT ( @@key_buffer_size + @@query_cache_size + @@tmp_table_size + @@innodb_buffer_pool_size + @@innodb_log_buffer_size +
-> @@max_connections * (
-> @@read_buffer_size + @@read_rnd_buffer_size + @@sort_buffer_size + @@join_buffer_size + @@binlog_cache_size ) ) / (1024 * 1024 * 1024) AS MAX_MEMORY_GB;
---------------
MAX_MEMORY_GB
---------------
19.0323
---------------

More logs etc. can be provided if somebody from MariaDB team investigates it now.

Comment by tawool [ 2018-12-21 ]

I've been upgrading from 10.1 to 10.2 and I've seen this phenomenon. Fixed by changing to jemalloc.
But I am worried about using jemalloc in 10.2. May I use jemalloc?

vi /etc/systemd/system/mariadb.service.d/migrated-from-my.cnf-settings.conf
[Service]
Environment="LD_PRELOAD=/uer/lib64/libjemalloc.so.1"

systemctl stop mariadb
systemctl daemon-reload
systemctl start mariadb

Comment by Sergei Golubchik [ 2018-12-22 ]

tawool, you must've hit MDEV-13403. It was fixed in 10.2.16 and 10.3.8. Indeed, the correct fix is to LD_PRELOAD jemalloc. The problem is not in jemalloc itself, but when jemalloc is linked into ha_tokudb.so, and tokudb is a plugin, loaded run-time. Jemalloc doesn't like it, it wants to handle all memory allocations, not be loaded dynamically into a running executable. Basically, if you use tokudb, you should LD_PRELOAD jemalloc, and that's what we implemented in MDEV-13403 fix.

Comment by Frank [ 2018-12-22 ]

Same issue after upgrading a Galera Cluster from 10.1.37 to 10.2.19 on CentOS 7.5.
Database with 500.000+ InnoDB only tables. Mysql_upgrade was executed.
Node with highest load starts swapping memory after half a day in production, other nodes already had increased memory usage.

Adding jemalloc 3.6.0 back on all nodes fixed the issue. Running stable for 2 weeks now.

/etc/systemd/system/mariadb.service.d/jemalloc.conf
[Service]
Environment="LD_PRELOAD=/usr/lib64/libjemalloc.so.1"

Comment by tawool [ 2018-12-24 ]

serg, We do not use tokudb. Only innodb is in use. It was okay before upgrading to 10.2.
Thank you for your kind reply.

Comment by Evgenij [ 2018-12-24 ]

serg, thanks for the info, but we don't use TokuDB either. All our servers use only InnoDB.

Comment by YURII KANTONISTOV [ 2018-12-24 ]

"InnoDB only" here as well.

Comment by Kris Shannon [ 2019-01-04 ]

No TokuDB.

Using MariaDB-server-10.2.21-1.el7.centos.x86_64 from http://yum.mariadb.org/10.2/centos7-amd64

Running under CloudLinux 7 with cPanel.

We added the systemd LD_PRELOAD=/usr/lib64/libjemalloc.so.1 snippet and the leak seems to have disappeared.

Comment by Sergei Golubchik [ 2019-01-04 ]

Two thoughts. "we don't use TokuDB" does not necessarily mean that no tokudb is installed. Is it visible in SHOW PLUGINS? Is it present in select * from mysql.plugin? Is the MariaDB-10.2.18-centos73-x86_64-tokudb-engine package installed?

If TokuDB is truly not present at all, then... It might be that just the pattern of allocations and deallocations that MariaDB uses (in your application) causes huge memory fragmentation with ptmalloc (that's what glibc uses), and jemalloc, simply by being a different memory allocator doesn't exhibit this behavior. In this case this behavior will be very difficult to reproduce, might be depending on the environment, OS, load, and whatsnot.

Comment by Evgenij [ 2019-01-04 ]

I don't have TokuDB:

MariaDB [(none)]> SHOW PLUGINS;
+-------------------------------+----------+--------------------+---------+---------+
| Name                          | Status   | Type               | Library | License |
+-------------------------------+----------+--------------------+---------+---------+
| binlog                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| mysql_native_password         | ACTIVE   | AUTHENTICATION     | NULL    | GPL     |
| mysql_old_password            | ACTIVE   | AUTHENTICATION     | NULL    | GPL     |
| wsrep                         | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| MyISAM                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| CSV                           | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| MRG_MyISAM                    | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| MEMORY                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| CLIENT_STATISTICS             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INDEX_STATISTICS              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| TABLE_STATISTICS              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| USER_STATISTICS               | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| Aria                          | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| SEQUENCE                      | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| InnoDB                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| INNODB_TRX                    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCKS                  | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCK_WAITS             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP                    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP_RESET              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMPMEM                 | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMPMEM_RESET           | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP_PER_INDEX          | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP_PER_INDEX_RESET    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_BUFFER_PAGE            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_BUFFER_PAGE_LRU        | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_BUFFER_POOL_STATS      | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_METRICS                | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_DEFAULT_STOPWORD    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_DELETED             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_BEING_DELETED       | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_CONFIG              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_INDEX_CACHE         | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_INDEX_TABLE         | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_TABLES             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_TABLESTATS         | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_INDEXES            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_COLUMNS            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_FIELDS             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_FOREIGN            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_FOREIGN_COLS       | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_TABLESPACES        | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_DATAFILES          | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_VIRTUAL            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_MUTEXES                | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_SEMAPHORE_WAITS    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_TABLESPACES_ENCRYPTION | ACTIVE   | INFORMATION SCHEMA | NULL    | BSD     |
| INNODB_TABLESPACES_SCRUBBING  | ACTIVE   | INFORMATION SCHEMA | NULL    | BSD     |
| PERFORMANCE_SCHEMA            | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| user_variables                | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| FEEDBACK                      | DISABLED | INFORMATION SCHEMA | NULL    | GPL     |
| partition                     | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
+-------------------------------+----------+--------------------+---------+---------+
52 rows in set (0.00 sec)

Server use CloudLinux 6 (based CentOS 6).

Comment by YURII KANTONISTOV [ 2019-01-04 ]

No TokuDB in the list of plugins.

The worst for our case is that /usr/lib64/libjemalloc.so.1 does not help - VIRT memory grew in a two weeks 20GB=>44GB.
RES is keeping close to 20GB which is ok.
But as OOM killer keeps an eye on VM => after another couple of weeks mysqld will be killed again:

MariaDB [(none)]> SELECT ( @@key_buffer_size + @@query_cache_size + @@tmp_table_size + @@innodb_buffer_pool_size +
    -> @@innodb_log_buffer_size + @@max_connections * ( @@read_buffer_size + @@read_rnd_buffer_size + @@sort_buffer_size +
    -> @@join_buffer_size + @@binlog_cache_size ) ) / (1024 * 1024 * 1024) AS MAX_MEMORY_GB;
+---------------+
 MAX_MEMORY_GB 
+---------------+
       19.0323 
+---------------+
1 row in set (0.00 sec)
 
-bash-4.2$ lsof -p 16130|grep malloc
mysqld  16130 mysql  mem    REG              253,0     212096 134248941 /usr/lib64/libjemalloc.so.1
 
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.6 us,  0.3 sy,  0.0 ni, 98.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 24509684 total,  1619504 free, 22086560 used,   803620 buff/cache
KiB Swap: 24969208 total, 14980380 free,  9988828 used.  1795920 avail Mem
 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16130 mysql     20   0   44.5g  20.3g   6192 S  53.3 87.1   1476:14 mysqld

Comment by tawool [ 2019-01-07 ]

"show plugins;" There is no tokudb, and "yum list * tokudb *" shows only the installable version.

Comment by tawool [ 2019-01-07 ]

ykantoni show variables like 'version_malloc_library'; Is the result of the test as shown below?
We have changed from "system" to "jemalloc".

version_malloc_library : jemalloc 3.6.0.0-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Comment by YURII KANTONISTOV [ 2019-01-07 ]

MariaDB [(none)]> show variables like 'version_malloc_library';
------------------------+------------------------------------------------------------
Variable_name           Value                                                      
------------------------+------------------------------------------------------------
 version_malloc_library  jemalloc 3.6.0-0-g46c0af68bd248b04df75e4f92d5fb804c3d75340 
------------------------+------------------------------------------------------------
1 row in set (0.01 sec)

Comment by YURII KANTONISTOV [ 2019-02-14 ]

Any idea how to fix this and avoid unexpected OOM kill events?

****************************************************************
Currently MariaDB server 10.2.21-MariaDB is using jemalloc:

jemalloc 3.6.0-0-g46c0af68bd248b04df75e4f92d5fb804c3d75340

on the machine with 24GB of RAM and 23GB swap:

# free -gh
              total        used        free      shared  buff/cache   available
Mem:            23G         22G        297M         60M        937M        625M
Swap:           23G        7.1G         16G

and 19G given to innodb in my.cnf, no TokuDB:

innodb_buffer_pool_size = 19327352832

But mysqld process pid=1952 ate 44+ GB of virtual memory:

# top -n1 -p 1952
. . .
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1952 mysql     20   0   44.9g  19.9g   6956 S  12.5 85.2   2858:58 mysqld
 
 
# cat /proc/1952/status
Name:   mysqld
Umask:  0006
State:  S (sleeping)
Tgid:   1952
Ngid:   11835
Pid:    1952
PPid:   1
TracerPid:      0
Uid:    1001    1001    1001    1001
Gid:    1001    1001    1001    1001
FDSize: 65536
Groups: 1001
VmPeak: 47186156 kB
VmSize: 47119192 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:  23105024 kB
VmRSS:  20887656 kB
RssAnon:        20880712 kB
RssFile:            6944 kB
RssShmem:              0 kB
VmData: 47016968 kB
VmStk:       132 kB
VmExe:     17520 kB
VmLib:     10640 kB
VmPTE:     91268 kB
VmSwap:  6093812 kB
Threads:        148
SigQ:   0/95594
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000087007
SigIgn: 0000000000001000
SigCgt: 00000001800066e9
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000004000
CapAmb: 0000000000000000
Seccomp:        0
Speculation_Store_Bypass:       vulnerable
Cpus_allowed:   ffffffff
Cpus_allowed_list:      0-31
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list:      0-1
voluntary_ctxt_switches:        547614
nonvoluntary_ctxt_switches:     3770

Comment by Michael Caplan [ 2019-03-09 ]

Running into a similar (maybe identical?) issue. MDEV-18866 No TokuDB. Hesitant to try jemalloc (results seem to vary). Issue became much more aggressive when adding Fulltext search to numerous tables. Should my issue be closed in favour of this one?

Comment by Sergey Vojtovich [ 2019-03-14 ]

michaelcaplan, your issue is most probably different: here we have complaints mostly about CentOS and sometimes even no InnoDB. While what you had was uneven allocations distribution between numa nodes and potential memory leak in InnoDB FTS.

Comment by Evgenij [ 2019-04-04 ]

Hey.

Has anyone found a solution for themselves other than using jemalloc?

Could you tell me if you are using QEMU based virtual machines? Thank you in advance.

Comment by Sergei Golubchik [ 2020-12-17 ]

Everybody here only complained about 10.2. Could it be that 10.3 is not affected?

Comment by YURII KANTONISTOV [ 2021-01-11 ]

> Could it be that 10.3 is not affected?

Our DB server was upgraded 10.2.21 => 10.4.17, to me memory consumption pattern looks very much the same.
There were no OOM kills recently, but - for few last months the load on this server was signifacantly less than earlier.

Server version: 10.4.17-MariaDB MariaDB Server
jemalloc 3.6.0-0-g46c0af68bd248b04df75e4f92d5fb804c3d75340

  1. free -gh
    total used free shared buff/cache available
    Mem: 23G 22G 229M 19M 1.0G 879M
    Swap: 23G 2.4G 21G

One customer particularly struggles with this issue, MariaDB 10.2.26.
Seems that OOM event occurrs on file=>memory map calls, could it be the way innodb reads the big tables from disk...

They collected the set of mysqld process map, status, list of open files few hours before OOM kill and soon after the autorestart, see attached before_after_oom_kill_maps_lsof_status.zip
if that is of any help.

Comment by YURII KANTONISTOV [ 2021-01-11 ]

Oops, forgot to attach a system log with OOM kill event.
When looked into the stack - most probably that is a trivial swap file mapping, not a data file. Anyway - mysqld_oom_killer_messages.txt

Generated at Thu Feb 08 08:20:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.