[MDEV-21372] 10.2 Memory Leak Created: 2019-12-20 Updated: 2021-07-23 Resolved: 2021-07-23 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server, Storage Engine - InnoDB |
| Affects Version/s: | 10.2.27 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Greg Herrell | Assignee: | Marko Mäkelä |
| Resolution: | Incomplete | Votes: | 1 |
| Labels: | None | ||
| Environment: |
CentOS 7 |
||
| Description |
|
After moving from 10.2.15 to 10.2.21 we began experiencing memory leaks. I have continued to upgrade and am currently on 10.2.27. The issue still exists. The best I can tell is that over time there is an incremental build-up of 64Mb memory allocations that are never reclaimed. Left unchecked, Maria grows until the OS terminates it. This server is the master in a master/slave set-up using mariabackup. On 10.2.15 it was using xtrabackup. After moving to 10.2.21 it was switched to mariabackup. I am not sure what other information to provide. This database server that holds 200K+ tables. Server resources were never a concern. |
| Comments |
| Comment by Marko Mäkelä [ 2019-12-20 ] |
|
Can you please try to post more details and try to identify the culprit of the leak (or memory fragmentation)? Another idea would be to compile the server cmake -DWITH_ASAN=ON -DWITH_SAFEMALLOC=OFF and to run a little bit of workload and then initiate shutdown. Does LeakSanitizer report anything? If not, then what you are experiencing could be memory fragmentation and not a genuine leak. |
| Comment by Greg Herrell [ 2019-12-20 ] |
|
Can you advise on where to obtain tcmalloc on Centos 7 and also what you would want me to do to provide good information back to you. We are considering moving to 10.3 and using jemalloc as well. |
| Comment by Greg Herrell [ 2019-12-27 ] |
|
I was able to install gperftools.x86_64 in the centos7-x86_64 repository. Using a systemd override file I am able to load tcmalloc with the following: The typical recurring issue is that 64mb allocations continue to accumulate until the OS kills the process. This usually takes weeks and there are hundreds of such allocations. Knowing this is the case: 1. What other environment variables would you recommend I set? Any guidance to provide useful information to you would be appreciated. |
| Comment by Marko Mäkelä [ 2019-12-27 ] |
|
longbeard, sorry, I am mostly familiar with Debian GNU/Linux, and I last used tcmalloc’s heap profiler in October 2017. Back then, I was able to visualize the heap profiling traces from tcmalloc. I used a Perl script that might have been part of the gperftools package. That script converted the traces to something that could be visualized by GraphViz (dot -Tsvg). Back then, I was interested in InnoDB’s heap memory usage (later filed as MDEV-18746), because our customer had opened a support ticket about excessive memory usage, and they were using InnoDB. Their problem turned out to be somewhere in the SQL layer. There ought to also a be possibility to create ‘flame graphs’ of memory allocations. I have seen that being used for highlighting CPU-intensive parts of the code. The bottom line is that you will have to visualize the memory usage somehow, to narrow down the source of the leak or fragmentation. Hopefully it is related to only a handful of statements that you are executing. You will probably have to experiment, reducing the stream of SQL statements that you are feeding to the server, until you have a minimal SQL script that demonstrates the problem. |
| Comment by Greg Herrell [ 2020-01-10 ] |
|
I was able to get tcmalloc up and running. I am able to get output from the heap profiler. However, I am now struggling with getting a debug version of the binary to run so I can have symbols to produce better ouptut. I follow the instructions @ https://mariadb.com/kb/en/compiling-mariadb-for-debugging/ regarding compiling. I am able to get a compiled binary. However, when I replace the installed mysqld binary with the non-stripped version I cannot get MariaDb to start. I get a Unregistered Authentication Agent for unix-process error in the logs. This is Centos 7. SELinux is disabled. Do you have any insight how I can get this to work? |
| Comment by Greg Herrell [ 2020-02-11 ] |
|
Using tcmalloc as the memory allocator seems to have stabilized the memory consumption. For anyone finding this via Google... This is a Centos 7 installation. I loaded tcmalloc via LD_PRELOAD by creating a conf file in the systemd folders. |
| Comment by Marko Mäkelä [ 2021-07-23 ] |
|
longbeard, sorry, I had overlooked your update. Because you wrote that switching to the tcmalloc allocator library fixed the problem for you, I do not think that this can be an actual memory leak, but rather fragmentation in the memory allocator. I will not close this as "not a bug", but "incomplete", because I believe that certain memory allocation patterns can be more likely to cause fragmentation in a memory allocator. Also, unnecessary heap memory allocation can be argued to be a bug. To fix the fragmentation, we would need a way to repeat it. And I understand that it could take significant effort to provide such a test harness (without disclosing any confidential data that would normally be stored in the database). |