[MDEV-19287] Memory leak issue in systemD on mariaDB cluster with remote ssh Created: 2019-04-19  Updated: 2021-12-06  Resolved: 2021-12-06

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.3.12, 10.3.13, 10.3.14, 10.2
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Levieux stéphane Assignee: Eugene Kosov (Inactive)
Resolution: Not a Bug Votes: 6
Labels: None
Environment:

(Vmware ) Debian 9 , Galera Cluster 10.3.14 . 18 Go memory


Attachments: PNG File Capture.PNG     PNG File Capture2.PNG     PNG File Screenshot 2019-08-30 at 10.22.55.png     PNG File leak-7d.png     PNG File m102-u16.png     File mariadb_memleak.log     Text File reports-memory-leak.txt     PNG File screenshot-1.png     PNG File screenshot-2.png     PNG File screenshot-3.png    
Issue Links:
Relates
relates to MDEV-20698 Master slowly running out of memory a... Open
relates to MDEV-21447 Uncontrollable memory allocation with... Stalled

 Description   

Hello,
Since 1 month i have a big problem on my servers, the memory decreases slowly but never be released . Fortunatly , a reboot occurs every week when server start to swap ...
I started when ui upgrade my mariadb galera cluster (debian 8 / mariadb 10.2 ) (3 servers) on debian 9 / mariadb 10.3.12 . I upgraded the os and mariadb (now 10.3.14) without result .
The more significative example is when i have the automysqlbackup script at 5h00 am ( mysqldump ) it uses several go of memory never released ( graph zabbix )

first capture shows backup at 5h00 AM and severe memory leak (server 1 )
second capture shows ( reboot at 13/04 3h00 am and memory decrease slowly , no backup on this server (mariadb 2 )

i'm not sure at 100% is a mariadb problem but the mysqldump seem the most significant example so ....

Here my my.cnf

# MariaDB database server configuration file.
#
# You can copy this file to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html
 
# This will be passed to all mysql clients
# It has been reported that passwords should be enclosed with ticks/quotes
# escpecially if they contain "#" chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.
[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock
 
# Here is entries for some specific programs
# The following values assume you have at least 32M ram
 
# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0
 
[mysqld]
#
# * Basic Settings
#
sql_mode = 'ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
 
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages     = en_US
skip-external-locking
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
# bind-address = 127.0.0.1
#
# * Fine Tuning
#
thread_concurrency = 8
thread_cache_size       = 128
 
 
performance_schema=on
skip-name-resolve
max_connections         = 450
connect_timeout         = 5
wait_timeout            = 60
max_allowed_packet      = 256M
thread_cache_size       = 128
sort_buffer_size        = 4M
bulk_insert_buffer_size = 16M
tmp_table_size          = 32M
max_heap_table_size     = 32M
key_buffer_size         = 64K
#
# * MyISAM
#
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched. On error, make copy and try a repair.
myisam_recover_options = BACKUP
# open-files-limit = 2000
table_open_cache        = 2048
concurrent_insert       = 2
#
# * Query Cache Configuration
#
# Cache only tiny result sets, so we can fit more in the query cache.
query_cache_limit               = 128K
query_cache_size                = 0
query_cache_type                = 0
# for more write intensive setups, set to DEMAND or OFF
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
# general_log_file = /var/log/mysql/mysql.log
# general_log = 1
#
# Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.
#
# we do want to know about network errors and such
log_warnings            = 2
#
# Enable the slow query log to see queries with especially long duration
#slow_query_log[={0|1}]
slow_query_log_file     = /var/log/mysql/mariadb-slow.log
long_query_time = 2
# log_slow_rate_limit = 1000
log_slow_verbosity      = query_plan
log_slow_admin_statements=ON
log_queries_not_using_indexes=OFF
log_throttle_queries_not_using_indexes=1
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
#       other settings you may need to change.
# server-id = 1
# report_host = master1
# auto_increment_increment = 2
# auto_increment_offset = 1
# log_bin = /var/log/mysql/mariadb-bin
# log_bin_index = /var/log/mysql/mariadb-bin.index
# not fab for performance, but safer
# sync_binlog = 1
# slaves
# relay_log = /var/log/mysql/relay-bin
# relay_log_index = /var/log/mysql/relay-bin.index
# relay_log_info_file = /var/log/mysql/relay-bin.info
#log_slave_updates
#read_only
#
# If applications support it, this stricter sql_mode prevents some
# mistakes like inserting invalid dates etc.
# sql_mode = NO_ENGINE_SUBSTITUTION,TRADITIONAL
#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
default_storage_engine  = InnoDB
# you can't just change log file size, requires special procedure
# innodb_log_file_size = 50M
innodb_buffer_pool_size = 8G
innodb_log_buffer_size  = 8M
innodb_file_per_table   = 1
innodb_io_capacity      = 400
innodb_flush_method     = O_DIRECT
 
 
 
# InnoDB
default-storage-engine                   = 'InnoDB'
innodb-stats-on-metadata                 = 0
innodb-stats-sample-pages                = 32
table-definition-cache                   = 2048
table-open-cache                         = 2048
# transaction-isolation = READ-COMMITTED
# # To be set to 0 if not multi transactional storage engine
innodb-buffer-pool-instances             = 8
innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
#
# # -- Set the following to maximum 60/70% of physical RAM.
# # innodb_max_dirty_pages_pct should be compute with redo log size in mind: It’s recommended to set the dirty pages smaller than redo log space. The more you have dirty page, the less io will be produced on disk.
innodb-max-dirty-pages-pct               = 50
innodb-file-per-table                    = 1
innodb-flush-log-at-trx-commit           = 1
# #Save and restore buffer pool to be transparent for user
# # innodb_flush_method = O_DIRECT
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb-log-buffer-size                   = 64M
innodb-log-files-in-group                = 2
# innodb-log-file-size = 1024M
# innodb_purge_threads = 1
# -- Raid1: 200. Raid10: 200 x write arrays. SSD: 5000. FusionIO: 20000.
# innodb_io_capacity = 200
# -- SSD & FusionIO can flush pages in random disk order, keep default for spinning disks
# innodb_flush_neighbors = 0
# -- Increase these numbers to increase global throughput on SAN (16,32,64)
innodb-read-io-threads                   = 8
innodb-write-io-threads                  = 8
# -- Set this to 2-4 times # of Cpus, maximum recommended is 64.
# -- iostat report io service time: io_capacity / service_time is a good starting point for innodb_write_io_threads
# -- innodb_read_io_threads starting point monitoring status # read ahead per second
innodb-thread-concurrency                = 8
#
innodb-open-files                       = 2048
#
 
 
#
# * Security Features
#
# Read the manual, too, if you want chroot!
# chroot = /var/lib/mysql/
#
# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
#
# ssl-ca = /etc/mysql/cacert.pem
# ssl-cert = /etc/mysql/server-cert.pem
# ssl-key = /etc/mysql/server-key.pem
 
#
# * Galera-related settings
#
 
log_error=/var/log/mysql/mariadb-error.log
[galera]
# Mandatory settings
# wsrep_on = ON
#wsrep_provider=
#wsrep_cluster_address=
# binlog_format = row
# default_storage_engine = InnoDB
# innodb_autoinc_lock_mode = 2
#
# Allow server to accept connections on all interfaces.
#
# bind-address = 0.0.0.0
binlog_format=ROW
 
expire-logs-days                         = 7
max-binlog-size                          = 1024M
sync-binlog                              = 1
binlog-stmt-cache-size                   = 128K
binlog-cache-size                        = 256K
 
 
 
 
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
wsrep_on=ON
wsrep_cluster_name='mariadb_cluster'
wsrep_node_name=node4
wsrep_node_address=10.253.253.93
wsrep_provider = /usr/lib/galera/libgalera_smm.so
# wsrep_provider_options = ""
wsrep_retry_autocommit = 0
# # wsrep_sst_method = rsync
wsrep_sst_method = mariabackup
wsrep_sst_auth = root:extra27frz
wsrep_slave_threads=16
wsrep_certify_nonPK=1
wsrep_max_ws_rows=0
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_log_conflicts = 1
wsrep_provider_options="gcache.size = 1G; gcache.name = /var/lib/mysql/galera.cache"
wsrep_cluster_address=gcomm://10.253.253.93,10.253.253.94,10.253.253.95
# wsrep_cluster_address = gcomm://
wsrep_forced_binlog_format=ROW
enforce_storage_engine = "InnoDB"
#
server-id       = 4
#
 
 
 
 
[mysqldump]
quick
quote-names
max_allowed_packet      = 16M
 
[mysql]
#no-auto-rehash # faster start of mysql but no tab completion
 
[isamchk]
key_buffer              = 16M
 
#
# * IMPORTANT: Additional settings that can override those from this file!
#   The files must end with '.cnf', otherwise they'll be ignored.
#
!include /etc/mysql/mariadb.cnf
!includedir /etc/mysql/conf.d/
!include /etc/mysql/secrets-backup.cnf
 
[xtrabackup]
databases-exclude=lost+found
ssl=0



 Comments   
Comment by Eugene Kosov (Inactive) [ 2019-04-24 ]

Hello. It's totally unclear what exactly happens in your case. Could you collect some info for us?

I suggest to use two things.

The first one, non-intrusive, is running perf top -p `pgrep mysqld` while backups are in progress. It's a profiler and it will show what code is executed during backups. Most probably that won't help us to understand the issue but lets try it anyway.

The second and more suitable tool is a heap profiler from Google's tcmalloc. It can be linked with LD_PRELOAD https://github.com/gperftools/gperftools
http://goog-perftools.sourceforge.net/doc/heap_profiler.html

I hope that --inuse_space will reveal the issue.

I don't think you can use heap profiler with production server because I suspect a great slowdown with that tool. Also it could require much more RAM.

Eventually we want to see a graph with information about allocated memory. Ideally one with megabytes and one with allocation counts --alloc_objects

Feel free to ask if you need help with these tools.

Comment by Eugene Kosov (Inactive) [ 2019-04-24 ]

BTW, just googled this thing https://github.com/iovisor/bcc/blob/master/tools/memleak_example.txt
Never tried it before but it seems you can use on a running server. Could you try it too?

Comment by Dmitriy Vasilyev [ 2019-04-27 ]

Same problem
Ubuntu 18.04 / MariaDB 10.3.14 / 8GB RAM / 8 Cores / Last updates all packages (VPS Fozzy.com)
Migrating the base to a new server, used before 10.1.38 MariaDB. In the configuration file has changed the memory settings for InnoDB.
Run profiler on prodduction server is impossible.
https://www.percona.com/blog/2018/06/28/what-to-do-when-mysql-runs-out-of-memory-troubleshooting-guide/ - Used this instruction. The results are fine, and the server should not consume more than 4 GB of memory. Modifications of configs do not give results.
Use the mysqldump every 30 mimutes, but the frequency of memory leaks does not coincide with the launch of the mysqldump
=======
sorry for bad english (native language - Russian)

Comment by Eugene Kosov (Inactive) [ 2019-04-30 ]

I understand that using tcmalloc is difficult but what's wrong with perf top and memleak.py? It will attach to working daemon by PID and will collect some info.

Comment by Dmitriy Vasilyev [ 2019-04-30 ]

the hypervisor KVM does not allow you to install the right kernel to run these utilities ...

Comment by Levieux stéphane [ 2019-05-02 ]

I have also difficulties ton install memleak.py on my production server... ( python dependencies and after that a bcc module required, i can"t install a lot of packages on my production..., pehaps i will ask to clone my VM ) . My mysql process is at 55% of memory , i have another process that take 15% , my server has not more memory and start to swap after 4 days ( swapiness at 1 ) where is my memory ? . I have another server under debian 9 without this problem. Complex ....

Comment by Dmitriy Vasilyev [ 2019-05-15 ]

I made a downgrade to the last 10.2, the problem persists.
Is a problem possible in the operating system? Debian 9 / Ubuntu 18.04?

8 CPU / 8 GB RAM VPS
my.cnf

[client]
port		= 3306
socket		= /var/run/mysqld/mysqld.sock
 
[mysqld_safe]
socket		= /var/run/mysqld/mysqld.sock
nice		= 0
 
[mysqld]
performance_schema	= OFF
skip-name-resolve
optimizer_switch='derived_merge=off,derived_with_keys=off'
user		= mysql
pid-file	= /var/run/mysqld/mysqld.pid
socket		= /var/run/mysqld/mysqld.sock
port		= 3306
basedir		= /usr
datadir		= /var/lib/mysql
tmpdir		= /tmp
lc_messages_dir	= /usr/share/mysql
lc_messages	= en_US
skip-external-locking
bind-address		= *
max_connections		= 2048
connect_timeout		= 10
wait_timeout		= 28800
max_allowed_packet	= 268435456
thread_cache_size       = 20
sort_buffer_size	= 32M
bulk_insert_buffer_size	= 8M
tmp_table_size		= 16M
max_heap_table_size	= 16M
myisam_recover_options	= BACKUP
key_buffer_size		= 32M
table_open_cache	= 400
myisam_sort_buffer_size	= 32M
concurrent_insert	= 3
read_buffer_size	= 8M
read_rnd_buffer_size	= 2M
query_cache_limit	= 8M
query_cache_size	= 256M
#query-cache-type	= 1
join_buffer_size 	= 16M
log_warnings		= 3
log_error		= /var/log/mysql/mysql_error.log
slow_query_log		= ON
slow_query_log_file	= /var/log/mysql/mariadb-slow.log
long_query_time		= 30
log_slow_verbosity	= query_plan
expire_logs_days	= 7
default_storage_engine	= InnoDB
innodb_buffer_pool_size	= 4096M
innodb_log_buffer_size	= 64M
innodb_log_file_size	= 512M
innodb_file_per_table	= 1
innodb_open_files	= 1024
innodb_io_capacity	= 400
innodb_flush_method	= O_DIRECT
innodb_buffer_pool_instances=4
innodb_strict_mode	= 0
table_open_cache = 4096
 
[galera]
 
[mysqldump]
quick
quote-names
max_allowed_packet	= 16M
 
[mysql]
 
[isamchk]
key_buffer		= 16M
!include /etc/mysql/mariadb.cnf
!includedir /etc/mysql/conf.d/

Comment by Levieux stéphane [ 2019-05-15 ]

My feeling about this problem : I think it's an OS (debian 9 or librairy) problem that occurs with mariadb ( or compounded by mariaDB ).
I have a debian 9 with haproxy and no leak memory
i have a zabbix 4 server on debian 9 with mariadb and i think the same problem occurs than on my cluster mariadb ( 3 servers )
In same time, on forums etc... i didn't find clearly any thread on a such big problem.
On my server mariadb 2 , i attempt to install the library libjemalloc1 , no change...
I suspected also a vmware compatibility problem with debian 9 .
I plan to clone a mariaDB VM of my cluster to have a server to test and then stop mariadb service. Even if the problem disappears , it will not be a real proof this is a mariadb problem if there is no more use of memory allocation OS that cause this leak ...
No solution for the moment ....

Comment by Levieux stéphane [ 2019-05-29 ]

After several months, trying so many thinks (mariadb, os etc... ) , i think i finally found the cause.
it seems there is a memory leak in systemd remote login ssh .
https://github.com/systemd/systemd/issues/8015

For my MariaDB cluster , i use a monitoring software on another server witch does a lot of remote ssh access.
Since i disabled the software , the memory seems stable ( of course after several hours..... waiting mariadb use its memory and buffers in my.cnf )
I will wait now for several days to have a 100% confirmation this issue is solved and is not a mariadb issue .

Comment by Eugene Kosov (Inactive) [ 2019-05-30 ]

Thanks for your help and investigations!

Actually, it is correct to call your case a 'high memory usage issue'. Memory leak is basically when malloc() was called but free() is not. And memory will never be returned to OS. But in your case memory is slowly returned to OS.

So, you thinks that those memory peaks are when some software does a lot of SSL connections to a server? In that case it could be relatively easy to reproduce it with doing connect + disconnect in a loop. I can easily imagine that every connection allocated something on the heap.

There is a big issue in progress MDEV-19515 Part of it is done and it's reducing allocation during connection. But it is fixed only in 10.5.

So, it would be great that your hypothesis about SSL connections is correct. Could you try to check it with connect + disconnect loop?

Comment by Levieux stéphane [ 2019-06-03 ]

I already ran a script from a remote server (found in thread on original issue )
here the script i used
while true; do ssh mariadbIP "whoami" 1>/dev/null; done

Effectivly i saw a little memory consumed and never released

After several days and modified the /etc/pam.d/systemd-user and enable ssh-session-cleanup.service , the time mariadb condumed all its memory, the memory available is stable on all servers of my cluster .

Comment by Dmitriy Vasilyev [ 2019-06-05 ]

Installed another operating system - Ubuntu 16.04. Memory behaves much more stable. MariaDB last 10.2
Config from post:
https://jira.mariadb.org/browse/MDEV-19287?focusedCommentId=127813&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-127813
Weekly graph

Comment by Søren Kröger [ 2019-06-06 ]

We are seeing something similar:

  • Ubuntu 14
  • Just upgraded from 10.1.19 to 10.2.19
  • Mysqld memory usage is stable
  • swappiness is configured to 1
  • Disk buffer occupies all ram (>100GB) at some point - after that mysql performance is suffering badly
  • innodb_flush_method is set to O_DIRECT
  • Server has 256GB ram, Innodb_buffers is set to 128GB

Workarounds for now:

  • Free up some ram by lowering the innodb buffer pool
  • Restart mysql
  • Unconfirmed yet, but flushing the disk buffer frees up a lot of ram:

    sync; echo 1 > /proc/sys/vm/drop_caches

Comment by Levieux stéphane [ 2019-07-05 ]

I confirm the problem is solved (for me ) (not a mariadb issue, remote ssh/sysemd memory leak), here the graph for the last 3 month, no need to explain the difference since 28/05

Comment by Eugene Kosov (Inactive) [ 2019-08-29 ]

OK. It seems that one memory leak was (hopefully) outside of MariaDB.

sbktrifork as of your issue right now I have no idea what it could be. Could you try to gather some information with one of tools I mentioned earlier in this issue?

Comment by Søren Kröger [ 2019-08-30 ]

I can confirm that the memory usage of the mariadb process is stable - it's not growing over time.
The trouble in our case is caused by the binary logs, which are saved in the operating systems (linux ubuntu 14 lts) pagebuffer.
As usual, linux is using as much RAM as possible for buffering and normally is able to free up the pagebuffer early enough so that other processes aren't getting into trouble.
This was also the case on our system with MariaDB 10.1.19, but since we upgraded to 10.2.19 the pagebuffer isn't able to release RAM in time and when the overall RAM usage of the server hits about 100%, all threads in Mariadb are getting queued up. We normally have about 5-20 active running threads - but when the combined RAM usage of the server hits 100%, we have about 300 active running threads which are running for several seconds - some of them even minutes.
As soon as we drop the caches

sync; echo 1 > /proc/sys/vm/drop_caches

, everything is back to normal.

We have disabled binlog_annotate_row_events, to lower the pressure on pagebuffer, but it didn't fix the problem.

We have running stable for several months now, just dropping the caches each hour via cron.

Normally I would say that this is a problem of the operating system, not MariaDB - but we haven't patched the operating system at all (we are using the generic tag.gz mariadb build). So it seems, that mariadb 10.2.19 somehow is able to make it harder/impossible for the operating system to clean up the page buffer.

I have attached a nice little screen shot showing the last outage. As you can see, Threads Running goes crazy 11:04, iowait drops, memory usage just hits about 100% and systemcpu is going crazy as well. The problem goes away when we flush the pagebuffer 11:20, where you also can see the drop of the memory usage from 100% to about 88%. As far as I can remember we freed up the ram by lowering the innodb buffer pool from 128GB to 100GB - so the 100% RAM (256GB RAM) is more 100GB in pagebuffers (crazy imo).

So, I know we have to upgrade the operating system - and maybe this will solve the problem.

But I don't understand:
1) Can changes in MariaDB 10.2.19's code influence the ability of the OS to free up the pagebuffer or maybe even lock it?
2) Shouldn't Mariadb write binary logs directly to disk - or have the option to enable direct io for binary logs?

Comment by Anton Avramov [ 2019-09-09 ]

Recently we are experiencing the same problem on a Debian 9 install with mariadb 10.2.26
I'm attaching an output of memleak of the process.
In our case I don't see a correlation between dumping the database. The used memory just gradually increases over time till it runs out and gets killed by OOKiller.
The bigger problem is the consumed memory doesn't get freed and the problem reappears faster.
I have killed all other processes and still the memory is not freed so I'm pretty sure the problem is in mariadb binary.
Let me know If I can gather more information. mariadb_memleak.log

Comment by Eugene Kosov (Inactive) [ 2019-10-02 ]

lukav
Nice! But I don't see anything criminal in memleak output. Biggest memory consumers are table triggers. Let's guess something is wrong with it.

Could you try to reproduce your problem with siplified environment? Specifically could you take your table with triggers alone and run your ordinary queries on it. Contents of a table is not relevant: you can take just some slice of your data. It would be great if you find at least a set of queries which cause memleak. The simpler you test case will be, the easier to me it would be to understand what happens in your case.

Or maybe you can run memleak checker for a longer time or with a bigger load? It may help if I see a real anomaly in log like some stacktrace with strangely big outstanding allocations.

Comment by azurit [ 2019-10-14 ]

We are having similar problem with MariaDB 10.3. After upgrade from 10.2, the memory usage is going really high (about 80%) in one day, then the restart of MariaDB is needed as system starts to behave unstable. We are experiencing this on multiple servers (all where MariaDB was upgraded to 10.3), both physical and virtual, all are Debian (one is Buster, the rest are Stretch).

I was trying to lower memory usage by lowering all caches etc. but it didn't help at all. Even more, when i used some diagnostic tools like mysqltuner, which are able to compute maximum memory usage for current configuration, they were all showing that maximum usage will be lower then what i was seeing in the OS (in other words, MariaDB was using MUCH more [GBs] memory then it should according to the configuration).

Comment by Eugene Kosov (Inactive) [ 2019-10-14 ]

azurit hi.
If you have a memory leak, than you have some query that leaks memory. You run it several times and memory consumption goes higher and higher. Could you try to find that query?

Comment by azurit [ 2019-10-14 ]

Eugene,

all of servers i was talking about are quite big webhosting servers with hundreds of qps, users and databases . There's no way to tell which query is causing it (if memleak is really what is happening).

Comment by azurit [ 2019-10-14 ]

Sorry, i just remembered, that problem started after upgrade from 10.1 to 10.2. Soon after, we did upgrade to 10.3 to try if it fixis the problem i was talking about above (it didn't).

Comment by Anton Avramov [ 2019-10-15 ]

The same story here.
Every server is under load when the problem manifests itself. When I try to recreate the problem in controlled environment I'm unsuccessful.
If you have any suggestions how to pin-point more accurately where the problem is please let us know.

Comment by Todd Michael [ 2019-10-15 ]

I have the same problem but with 10.4 [Windows Server/10, 64 bit] and NOT with 10.3, which I have had to revert to.

It seems to show itself only with complex queries that have many subqueries.

The only relief that I have discovered is to reduce innodb_buffer_pool_instances to 1.

10.4 seems to act (for me) as though EACH buffer pool instance is trying to reserve as much memory as previously ALL instances would reserve collectively: eg if innodb_buffer_pool_size=8G and innodb_buffer_pool_instances=4, you would expect each instance to reserve only 2G, but each instance seems to blow up to 8G, taking the entire memory allocation to 4x8G=32G. But this may be entirely specious and coincidental.

Also, the same query that runs fine on 10.3 but bloats 10.4 will run very very slowly as the memory is gradually chewed up. My naive impression is that these two things together look like a memory leak as the system struggles through the long query.

I have not yet been able to scale down the query to make it intelligible to humans: it is an algorithmically generated query and it's rather esoteric. If I find a simplified example, I'll post it.

Comment by Fernando Mattera [ 2019-10-24 ]

mmm....

PERFORMANCE_SCHEMA=ON

did you try disabling it?

Comment by SirWill [ 2020-04-23 ]

We are currently experiencing the issue that the memory is not being released after running mysqldump on a loop to backup all databases separately.
Around 30 minutes after the backup has finished mysql gets killed by oom-killer.
Is this the same issue which is reported here?
Is there any known workaround to this issue?

( Ubuntu 16.04.6, mysql Ver 15.1 Distrib 10.2.31-MariaDB)

Comment by azurit [ 2020-04-24 ]

SirWill we have very similar issue on multiple servers, it started after upgrade to 10.2 (and upgrade to 10.3 didn't help).

Comment by Anton Avramov [ 2020-04-24 ]

The problem still exists in 10.4 too. The problem is that it is just not reproducible to make a meaningful report.

Comment by Eugene Kosov (Inactive) [ 2020-04-24 ]

Some investigation was made as part of MDEV-21447
It's known that already 10.1 requires more memory than 5.5 But it's unknown what exactly requires more memory. The relevant code make a heavy use of memory arena which makes it difficult to understand what part of code do most allocations. This problem is widely known and advertised among relevant developers. But it's not easy and I can't say when it will be fixed.

Comment by Marco Jonas [ 2020-10-09 ]

I had also the same issue with MariaDB 10.4.14. Ubuntu 18.04.5, 28 GB RAM, 8 Cores
I tried to backup 2.800 database with a Script, mysqldump, for every single db. Most databases are not very big.

The result was, the oom_reaper killed the mysqld service, after 70% and 14 GB of written data.

Oct 6 14:59:52 <servername> kernel: [741529.935764] Out of memory: Kill process 2114 (mysqld) score 398 or sacrifice child
Oct 6 14:59:52 <servername> kernel: [741529.939236] Killed process 2114 (mysqld) total-vm:19089924kB, anon-rss:9507492kB, file-rss:0kB, shmem-rss:0kB
Oct 6 14:59:53 <servername> kernel: [741530.819393] oom_reaper: reaped process 2114 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0k

Comment by Stefano Bovina [ 2021-04-12 ]

I think I have the same issue. MariaDB-server-10.3.28-1.el7.centos.x86_64, 16GB RAM

  • Peaks ---> mysqldump
  • Drops ---> mariadb service restarted

Comment by Lionel Enkaoua [ 2021-07-18 ]

I also have this memory leak issue only when running mysqldump.
We have 200 databases to backup with around 400.000 tables, with MariabDB 10.1 it was taking 30 minutes, now the best we succeeded with 10.5 is 2.5 hours, but now we are more close to 6 hours, not sure why, and we are forced to restart mysql at the end if it doesn't crash before, because the server is swapping a lot. ```sync; echo 1 > /proc/sys/vm/drop_caches``` is not helping.

Part of my.cnf:
```
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_buffer_pool_dump_at_shutdown = 1
innodb_buffer_pool_load_at_startup = 1
tmp_table_size = 32M
max_heap_table_size = 32M
table_definition_cache = 20000
table_open_cache = 4000

query_cache_size = 0
query_cache_type = 0
```
Engine-Independent Table Statistics were generated with mysqlcheck -a -Z after upgrade to 10.5.11

Mysqldump run with this argument: --host=XXXX --port=3306 --no-autocommit --single-transaction --opt -Q | gzip -f > dump.sql
We didn't have this issue before upgrading to MariaDB 10.5.11 from 10.1.
The OS is CentOS Linux release 7.3.1611 (Core)

Here is the memleak output (./memleak -p `pgrep mariadbd`) during the backup of one database (only innodb) just after restarting Mysql in our dev environment.
https://pastebin.com/raw/y7rnNWK4
According to htop, mariadb took around 300MB after mysqldump.

Is it normal, that mysqldump took already so much memory for only one small database just after restart ?

Running it another time it was only 16MB, but this database is pretty small compared to the others (exported .sql around 100MB)
https://pastebin.com/raw/qiG5HTFd

Comment by Lionel Enkaoua [ 2021-07-21 ]

Please find another memory leak report of 10 databases, with a flush tables command executed at the end of the process, before stopping the memory leak.

The backup has been executed on our dev environment, with more or less same data, but with same OS and mariadb server 10.5.11. But on this dev server there is no traffic.

mysqldump was exeuted with --no-data as requested by Daniel Black, which is helping me a lot to improve performance and figuring this out.

reports-memory-leak.txt

The mariadb memory increased from 1.7GB to 3.24GB.

Here is the report of SHOW GLOBAL status
before dump: https://pastebin.com/dAVQuWup
after dump : https://pastebin.com/b89ztc5n

Let me know if you need anymore information

Comment by Daniel Black [ 2021-07-22 ]

kevg I spoke to lionele about the above. The after dump SHOW GLOBAL STATUS is before the FLUSH TABLES which is why there are still Open_tables/Open_table_definitions. The real open_tables figure was ~330 a day later without activity on that dev server so significantly less than the 10k, 100K allocations in under ha_open that appeared lost.

Towards the bottom of the reports-memory-leak.txt shows a significant number of leaks under handler::ha_open.

Changing the size of table_open_cache appeared to purge out the memory used (and close all tables, which seems excessive, but that can be a different bug (if it isn't ready)).

Comment by Daniel Black [ 2021-07-22 ]

minimal test case 10.2-4aeb2b1c6c542e09be2474644358038279197529

sql/mysqld --no-defaults --datadir=/tmp/${PWD##*/}-datadir --socket=/tmp/${PWD##*/}.sock --plugin-dir=${PWD}/mysql-test/var/plugins/ --verbose --skip-networking
 
$ for t in $(seq 3000); do echo "create table t$t (i int);"; done | mysql  --socket=/tmp/${PWD##*/}.sock  -u root test
 
shutdown mysqld and restart same way.
 
$  ps -eo rss,resident,size,share,vsize,pid,cmd  | grep mysqld
132128    - 688452 - 1763960 58383 sql/mysqld --no-defaults --datadir=/tmp/build-mariadb-server-10.2_second-datadir --socket=/tmp/build-mariadb-server-10.2_second.sock --plugin-dir=/home/dan/repos/build-mariadb-server-10.2_second/mysql-test/var/plugins/ --verbose --skip-networking
 
$ mysqldump --no-data --socket=/tmp/${PWD##*/}.sock -u root test  > /dev/null
 
~/repos/build-mariadb-server-10.2_second 
$  ps -eo rss,resident,size,share,vsize,pid,cmd  | grep mysqld
277864    - 759076 - 1895332 58383 sql/mysqld --no-defaults --datadir=/tmp/build-mariadb-server-10.2_second-datadir --socket=/tmp/build-mariadb-server-10.2_second.sock --plugin-dir=/home/dan/repos/build-mariadb-server-10.2_second/mysql-test/var/plugins/ --verbose --skip-networking
 
$ mysqldump --no-data --socket=/tmp/${PWD##*/}.sock -u root test  > /dev/null
 
~/repos/build-mariadb-server-10.2_second 
$  ps -eo rss,resident,size,share,vsize,pid,cmd  | grep mysqld
278248    - 759244 - 1895332 58383 sql/mysqld --no-defaults --datadir=/tmp/build-mariadb-server-10.2_second-datadir --socket=/tmp/build-mariadb-server-10.2_second.sock --plugin-dir=/home/dan/repos/build-mariadb-server-10.2_second/mysql-test/var/plugins/ --verbose --skip-networking
 
$ mysqldump --no-data --socket=/tmp/${PWD##*/}.sock -u root test  > /dev/null
 
~/repos/build-mariadb-server-10.2_second 
$  ps -eo rss,resident,size,share,vsize,pid,cmd  | grep mysqld
278248    - 759268 - 1895332 58383 sql/mysqld --no-defaults --datadir=/tmp/build-mariadb-server-10.2_second-datadir --socket=/tmp/build-mariadb-server-10.2_second.sock --plugin-dir=/home/dan/repos/build-mariadb-server-10.2_second/mysql-test/var/plugins/ --verbose --skip-networking
 
~/repos/build-mariadb-server-10.2_second 
$ mysqldump --no-data --socket=/tmp/${PWD##*/}.sock -u root test  > /dev/null
 
$  ps -eo rss,resident,size,share,vsize,pid,cmd  | grep mysqld
278248    - 759316 - 1895332 58383 sql/mysqld --no-defaults --datadir=/tmp/build-mariadb-server-10.2_second-datadir --socket=/tmp/build-mariadb-server-10.2_second.sock --plugin-dir=/home/dan/repos/build-mariadb-server-10.2_second/mysql-test/var/plugins/ --verbose --skip-networking

The size (second ps number, resident is -}, grows every single mysqldump run.

A shutdown will free all memory and won't leak, however while running there is a table_cache/definition growth.

Added --single-transaction to the mysqldump and it didn't leak.

Attempts like the following don't leak:

$ for t in $(seq 3000); do echo "LOCK TABLES t$t  READ LOCAL ;SELECT TRIGGER_NAME FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_OBJECT_SCHEMA = DATABASE() AND EVENT_OBJECT_TABLE = 't$t'; SELECT engine, table_type FROM INFORMATION_SCHEMA.TABLES WHERE table_schema = DATABASE() AND table_name = 't$t'; show fields from t$t; show create table  t$t ;"; done | mc -u root test > /dev/null

Comment by Hans Dampf [ 2021-10-14 ]

I had this exact problem on Debian 10 with latest 10.3.23 from the repo.
Upgraded to Debian 11 and mariadb 10.5.11 but the problem still persists in that version. Using option --single-transaction does not solve the problem for me, memory usage is still growing even beyond the configured max size

Comment by Stefan König [ 2021-10-20 ]

Hello,

I can confirm this problem is still happening with 10.5.11, at least with Debian 11.

Cannot believe this bug is open for over two years now. Will this be ever fixed/solved?

Regards
Stefan König

Comment by Daniel Black [ 2021-10-21 ]

I did try for a few hours yesterday before you commented trying to reproduce this. I'll try again sometime soon after re-reading this bug all again.

Comment by Levieux stéphane [ 2021-10-21 ]

Hello,

Don't know why my ticket is still opened , in comments i wrote all details and how to reproduce it ( after severals months ... it was finally a memory leak in systemD on my mariaDB cluster with remote ssh ( i put the link and all details ) .
So, in my case, the problem is closed for a long time, surely members should create their own ticket because it's not really the same case as mine instead the subject is about memory leak
( don"t see how to close my ticket )

Comment by Sergei Golubchik [ 2021-10-25 ]

I've added your description to the issue title, hopefully it won't match everybody's case anymore. we should close it soon, indeed

Generated at Thu Feb 08 08:50:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.