[MDEV-19287] Memory leak issue in systemD on mariaDB cluster with remote ssh Created: 2019-04-19 Updated: 2021-12-06 Resolved: 2021-12-06 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER |
| Affects Version/s: | 10.3.12, 10.3.13, 10.3.14, 10.2 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Levieux stéphane | Assignee: | Eugene Kosov (Inactive) |
| Resolution: | Not a Bug | Votes: | 6 |
| Labels: | None | ||
| Environment: |
(Vmware ) Debian 9 , Galera Cluster 10.3.14 . 18 Go memory |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
Hello, first capture shows backup at 5h00 AM and severe memory leak (server 1 ) i'm not sure at 100% is a mariadb problem but the mysqldump seem the most significant example so .... Here my my.cnf
|
| Comments |
| Comment by Eugene Kosov (Inactive) [ 2019-04-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello. It's totally unclear what exactly happens in your case. Could you collect some info for us? I suggest to use two things. The first one, non-intrusive, is running perf top -p `pgrep mysqld` while backups are in progress. It's a profiler and it will show what code is executed during backups. Most probably that won't help us to understand the issue but lets try it anyway. The second and more suitable tool is a heap profiler from Google's tcmalloc. It can be linked with LD_PRELOAD https://github.com/gperftools/gperftools I hope that --inuse_space will reveal the issue. I don't think you can use heap profiler with production server because I suspect a great slowdown with that tool. Also it could require much more RAM. Eventually we want to see a graph with information about allocated memory. Ideally one with megabytes and one with allocation counts --alloc_objects Feel free to ask if you need help with these tools. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-04-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
BTW, just googled this thing https://github.com/iovisor/bcc/blob/master/tools/memleak_example.txt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dmitriy Vasilyev [ 2019-04-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Same problem | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-04-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I understand that using tcmalloc is difficult but what's wrong with perf top and memleak.py? It will attach to working daemon by PID and will collect some info. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dmitriy Vasilyev [ 2019-04-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
the hypervisor KVM does not allow you to install the right kernel to run these utilities ... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Levieux stéphane [ 2019-05-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have also difficulties ton install memleak.py on my production server... ( python dependencies and after that a bcc module required, i can"t install a lot of packages on my production..., pehaps i will ask to clone my VM ) . My mysql process is at 55% of memory , i have another process that take 15% , my server has not more memory and start to swap after 4 days ( swapiness at 1 ) where is my memory ? . I have another server under debian 9 without this problem. Complex .... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dmitriy Vasilyev [ 2019-05-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I made a downgrade to the last 10.2, the problem persists. 8 CPU / 8 GB RAM VPS
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Levieux stéphane [ 2019-05-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
My feeling about this problem : I think it's an OS (debian 9 or librairy) problem that occurs with mariadb ( or compounded by mariaDB ). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Levieux stéphane [ 2019-05-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After several months, trying so many thinks (mariadb, os etc... ) , i think i finally found the cause. For my MariaDB cluster , i use a monitoring software on another server witch does a lot of remote ssh access. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-05-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for your help and investigations! Actually, it is correct to call your case a 'high memory usage issue'. Memory leak is basically when malloc() was called but free() is not. And memory will never So, you thinks that those memory peaks are when some software does a lot of SSL connections to a server? In that case it could be relatively easy to reproduce it with doing connect + disconnect in a loop. I can easily imagine that every connection allocated something on the heap. There is a big issue in progress MDEV-19515 Part of it is done and it's reducing allocation during connection. But it is fixed only in 10.5. So, it would be great that your hypothesis about SSL connections is correct. Could you try to check it with connect + disconnect loop? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Levieux stéphane [ 2019-06-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I already ran a script from a remote server (found in thread on original issue ) Effectivly i saw a little memory consumed and never released After several days and modified the /etc/pam.d/systemd-user and enable ssh-session-cleanup.service , the time mariadb condumed all its memory, the memory available is stable on all servers of my cluster . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dmitriy Vasilyev [ 2019-06-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Installed another operating system - Ubuntu 16.04. Memory behaves much more stable. MariaDB last 10.2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Søren Kröger [ 2019-06-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are seeing something similar:
Workarounds for now:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Levieux stéphane [ 2019-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I confirm the problem is solved (for me ) (not a mariadb issue, remote ssh/sysemd memory leak), here the graph for the last 3 month, no need to explain the difference since 28/05 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-08-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
OK. It seems that one memory leak was (hopefully) outside of MariaDB. sbktrifork as of your issue right now I have no idea what it could be. Could you try to gather some information with one of tools I mentioned earlier in this issue? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Søren Kröger [ 2019-08-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can confirm that the memory usage of the mariadb process is stable - it's not growing over time.
, everything is back to normal. We have disabled binlog_annotate_row_events, to lower the pressure on pagebuffer, but it didn't fix the problem. We have running stable for several months now, just dropping the caches each hour via cron. Normally I would say that this is a problem of the operating system, not MariaDB - but we haven't patched the operating system at all (we are using the generic tag.gz mariadb build). So it seems, that mariadb 10.2.19 somehow is able to make it harder/impossible for the operating system to clean up the page buffer. I have attached a nice little screen shot showing the last outage. As you can see, Threads Running goes crazy 11:04, iowait drops, memory usage just hits about 100% and systemcpu is going crazy as well. The problem goes away when we flush the pagebuffer 11:20, where you also can see the drop of the memory usage from 100% to about 88%. As far as I can remember we freed up the ram by lowering the innodb buffer pool from 128GB to 100GB - so the 100% RAM (256GB RAM) is more 100GB in pagebuffers (crazy imo). So, I know we have to upgrade the operating system - and maybe this will solve the problem. But I don't understand: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anton Avramov [ 2019-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Recently we are experiencing the same problem on a Debian 9 install with mariadb 10.2.26 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-10-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
lukav Could you try to reproduce your problem with siplified environment? Specifically could you take your table with triggers alone and run your ordinary queries on it. Contents of a table is not relevant: you can take just some slice of your data. It would be great if you find at least a set of queries which cause memleak. The simpler you test case will be, the easier to me it would be to understand what happens in your case. Or maybe you can run memleak checker for a longer time or with a bigger load? It may help if I see a real anomaly in log like some stacktrace with strangely big outstanding allocations. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by azurit [ 2019-10-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are having similar problem with MariaDB 10.3. After upgrade from 10.2, the memory usage is going really high (about 80%) in one day, then the restart of MariaDB is needed as system starts to behave unstable. We are experiencing this on multiple servers (all where MariaDB was upgraded to 10.3), both physical and virtual, all are Debian (one is Buster, the rest are Stretch). I was trying to lower memory usage by lowering all caches etc. but it didn't help at all. Even more, when i used some diagnostic tools like mysqltuner, which are able to compute maximum memory usage for current configuration, they were all showing that maximum usage will be lower then what i was seeing in the OS (in other words, MariaDB was using MUCH more [GBs] memory then it should according to the configuration). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-10-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
azurit hi. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by azurit [ 2019-10-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Eugene, all of servers i was talking about are quite big webhosting servers with hundreds of qps, users and databases . There's no way to tell which query is causing it (if memleak is really what is happening). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by azurit [ 2019-10-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry, i just remembered, that problem started after upgrade from 10.1 to 10.2. Soon after, we did upgrade to 10.3 to try if it fixis the problem i was talking about above (it didn't). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anton Avramov [ 2019-10-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The same story here. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Todd Michael [ 2019-10-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have the same problem but with 10.4 [Windows Server/10, 64 bit] and NOT with 10.3, which I have had to revert to. It seems to show itself only with complex queries that have many subqueries. The only relief that I have discovered is to reduce innodb_buffer_pool_instances to 1. 10.4 seems to act (for me) as though EACH buffer pool instance is trying to reserve as much memory as previously ALL instances would reserve collectively: eg if innodb_buffer_pool_size=8G and innodb_buffer_pool_instances=4, you would expect each instance to reserve only 2G, but each instance seems to blow up to 8G, taking the entire memory allocation to 4x8G=32G. But this may be entirely specious and coincidental. Also, the same query that runs fine on 10.3 but bloats 10.4 will run very very slowly as the memory is gradually chewed up. My naive impression is that these two things together look like a memory leak as the system struggles through the long query. I have not yet been able to scale down the query to make it intelligible to humans: it is an algorithmically generated query and it's rather esoteric. If I find a simplified example, I'll post it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Fernando Mattera [ 2019-10-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mmm.... PERFORMANCE_SCHEMA=ON did you try disabling it? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by SirWill [ 2020-04-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are currently experiencing the issue that the memory is not being released after running mysqldump on a loop to backup all databases separately. ( Ubuntu 16.04.6, mysql Ver 15.1 Distrib 10.2.31-MariaDB) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by azurit [ 2020-04-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
SirWill we have very similar issue on multiple servers, it started after upgrade to 10.2 (and upgrade to 10.3 didn't help). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anton Avramov [ 2020-04-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The problem still exists in 10.4 too. The problem is that it is just not reproducible to make a meaningful report. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2020-04-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Some investigation was made as part of MDEV-21447 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marco Jonas [ 2020-10-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I had also the same issue with MariaDB 10.4.14. Ubuntu 18.04.5, 28 GB RAM, 8 Cores The result was, the oom_reaper killed the mysqld service, after 70% and 14 GB of written data. Oct 6 14:59:52 <servername> kernel: [741529.935764] Out of memory: Kill process 2114 (mysqld) score 398 or sacrifice child | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefano Bovina [ 2021-04-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think I have the same issue. MariaDB-server-10.3.28-1.el7.centos.x86_64, 16GB RAM
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Lionel Enkaoua [ 2021-07-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I also have this memory leak issue Part of my.cnf: query_cache_size = 0 Mysqldump run with this argument: --host=XXXX --port=3306 --no-autocommit --single-transaction --opt -Q | gzip -f > dump.sql Here is the memleak output (./memleak -p `pgrep mariadbd`) during the backup of one database (only innodb) just after restarting Mysql in our dev environment. Is it normal, that mysqldump took already so much memory for only one small database just after restart ? Running it another time it was only 16MB, but this database is pretty small compared to the others (exported .sql around 100MB) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Lionel Enkaoua [ 2021-07-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Please find another memory leak report of 10 databases, with a flush tables command executed at the end of the process, before stopping the memory leak. The backup has been executed on our dev environment, with more or less same data, but with same OS and mariadb server 10.5.11. But on this dev server there is no traffic. mysqldump was exeuted with --no-data as requested by Daniel Black, which is helping me a lot to improve performance and figuring this out. The mariadb memory increased from 1.7GB to 3.24GB. Here is the report of SHOW GLOBAL status Let me know if you need anymore information | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-07-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
kevg I spoke to lionele about the above. The after dump SHOW GLOBAL STATUS is before the FLUSH TABLES which is why there are still Open_tables/Open_table_definitions. The real open_tables figure was ~330 a day later without activity on that dev server so significantly less than the 10k, 100K allocations in under ha_open that appeared lost. Towards the bottom of the reports-memory-leak.txt shows a significant number of leaks under handler::ha_open. Changing the size of table_open_cache appeared to purge out the memory used (and close all tables, which seems excessive, but that can be a different bug (if it isn't ready)). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-07-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The size (second ps number, resident is -}, grows every single mysqldump run. A shutdown will free all memory and won't leak, however while running there is a table_cache/definition growth. Added --single-transaction to the mysqldump and it didn't leak. Attempts like the following don't leak:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Hans Dampf [ 2021-10-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I had this exact problem on Debian 10 with latest 10.3.23 from the repo. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefan König [ 2021-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello, I can confirm this problem is still happening with 10.5.11, at least with Debian 11. Cannot believe this bug is open for over two years now. Will this be ever fixed/solved? Regards | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-10-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I did try for a few hours yesterday before you commented trying to reproduce this. I'll try again sometime soon after re-reading this bug all again. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Levieux stéphane [ 2021-10-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello, Don't know why my ticket is still opened , in comments i wrote all details and how to reproduce it ( after severals months ... it was finally a memory leak in systemD on my mariaDB cluster with remote ssh ( i put the link and all details ) . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-10-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've added your description to the issue title, hopefully it won't match everybody's case anymore. we should close it soon, indeed |