Details

    Description

      After moving from 10.2.15 to 10.2.21 we began experiencing memory leaks. I have continued to upgrade and am currently on 10.2.27. The issue still exists. The best I can tell is that over time there is an incremental build-up of 64Mb memory allocations that are never reclaimed. Left unchecked, Maria grows until the OS terminates it. This server is the master in a master/slave set-up using mariabackup. On 10.2.15 it was using xtrabackup. After moving to 10.2.21 it was switched to mariabackup. I am not sure what other information to provide. This database server that holds 200K+ tables. Server resources were never a concern.

      Attachments

        Activity

          Can you please try to post more details and try to identify the culprit of the leak (or memory fragmentation)?
          One idea would be to use tcmalloc and https://gperftools.github.io/gperftools/heapprofile.html
          You do not need to recompile anything for that, just setting LD_PRELOAD and a few other environment variables should be enough.

          Another idea would be to compile the server cmake -DWITH_ASAN=ON -DWITH_SAFEMALLOC=OFF and to run a little bit of workload and then initiate shutdown. Does LeakSanitizer report anything? If not, then what you are experiencing could be memory fragmentation and not a genuine leak.

          marko Marko Mäkelä added a comment - Can you please try to post more details and try to identify the culprit of the leak (or memory fragmentation)? One idea would be to use tcmalloc and https://gperftools.github.io/gperftools/heapprofile.html You do not need to recompile anything for that, just setting LD_PRELOAD and a few other environment variables should be enough. Another idea would be to compile the server cmake -DWITH_ASAN=ON -DWITH_SAFEMALLOC=OFF and to run a little bit of workload and then initiate shutdown. Does LeakSanitizer report anything? If not, then what you are experiencing could be memory fragmentation and not a genuine leak.
          longbeard Greg Herrell added a comment -

          Can you advise on where to obtain tcmalloc on Centos 7 and also what you would want me to do to provide good information back to you.

          We are considering moving to 10.3 and using jemalloc as well.

          longbeard Greg Herrell added a comment - Can you advise on where to obtain tcmalloc on Centos 7 and also what you would want me to do to provide good information back to you. We are considering moving to 10.3 and using jemalloc as well.
          longbeard Greg Herrell added a comment -

          I was able to install gperftools.x86_64 in the centos7-x86_64 repository. Using a systemd override file I am able to load tcmalloc with the following:
          [Service]
          Environment="LD_PRELOAD=/usr/lib64/libtcmalloc.so.4.4.5" HEAPPROFILE=/tmp/profile

          The typical recurring issue is that 64mb allocations continue to accumulate until the OS kills the process. This usually takes weeks and there are hundreds of such allocations. Knowing this is the case:

          1. What other environment variables would you recommend I set?
          2. What would you want me to do with pprof when this begins to happen?

          Any guidance to provide useful information to you would be appreciated.

          longbeard Greg Herrell added a comment - I was able to install gperftools.x86_64 in the centos7-x86_64 repository. Using a systemd override file I am able to load tcmalloc with the following: [Service] Environment="LD_PRELOAD=/usr/lib64/libtcmalloc.so.4.4.5" HEAPPROFILE=/tmp/profile The typical recurring issue is that 64mb allocations continue to accumulate until the OS kills the process. This usually takes weeks and there are hundreds of such allocations. Knowing this is the case: 1. What other environment variables would you recommend I set? 2. What would you want me to do with pprof when this begins to happen? Any guidance to provide useful information to you would be appreciated.

          longbeard, sorry, I am mostly familiar with Debian GNU/Linux, and I last used tcmalloc’s heap profiler in October 2017. Back then, I was able to visualize the heap profiling traces from tcmalloc. I used a Perl script that might have been part of the gperftools package. That script converted the traces to something that could be visualized by GraphViz (dot -Tsvg).

          Back then, I was interested in InnoDB’s heap memory usage (later filed as MDEV-18746), because our customer had opened a support ticket about excessive memory usage, and they were using InnoDB. Their problem turned out to be somewhere in the SQL layer.

          There ought to also a be possibility to create ‘flame graphs’ of memory allocations. I have seen that being used for highlighting CPU-intensive parts of the code.

          The bottom line is that you will have to visualize the memory usage somehow, to narrow down the source of the leak or fragmentation. Hopefully it is related to only a handful of statements that you are executing. You will probably have to experiment, reducing the stream of SQL statements that you are feeding to the server, until you have a minimal SQL script that demonstrates the problem.

          marko Marko Mäkelä added a comment - longbeard , sorry, I am mostly familiar with Debian GNU/Linux, and I last used tcmalloc’s heap profiler in October 2017. Back then, I was able to visualize the heap profiling traces from tcmalloc. I used a Perl script that might have been part of the gperftools package. That script converted the traces to something that could be visualized by GraphViz ( dot -Tsvg ). Back then, I was interested in InnoDB’s heap memory usage (later filed as MDEV-18746 ), because our customer had opened a support ticket about excessive memory usage, and they were using InnoDB. Their problem turned out to be somewhere in the SQL layer. There ought to also a be possibility to create ‘flame graphs’ of memory allocations. I have seen that being used for highlighting CPU-intensive parts of the code. The bottom line is that you will have to visualize the memory usage somehow, to narrow down the source of the leak or fragmentation. Hopefully it is related to only a handful of statements that you are executing. You will probably have to experiment, reducing the stream of SQL statements that you are feeding to the server, until you have a minimal SQL script that demonstrates the problem.
          longbeard Greg Herrell added a comment -

          I was able to get tcmalloc up and running. I am able to get output from the heap profiler.

          However, I am now struggling with getting a debug version of the binary to run so I can have symbols to produce better ouptut. I follow the instructions @ https://mariadb.com/kb/en/compiling-mariadb-for-debugging/ regarding compiling. I am able to get a compiled binary. However, when I replace the installed mysqld binary with the non-stripped version I cannot get MariaDb to start. I get a Unregistered Authentication Agent for unix-process error in the logs. This is Centos 7. SELinux is disabled. Do you have any insight how I can get this to work?

          longbeard Greg Herrell added a comment - I was able to get tcmalloc up and running. I am able to get output from the heap profiler. However, I am now struggling with getting a debug version of the binary to run so I can have symbols to produce better ouptut. I follow the instructions @ https://mariadb.com/kb/en/compiling-mariadb-for-debugging/ regarding compiling. I am able to get a compiled binary. However, when I replace the installed mysqld binary with the non-stripped version I cannot get MariaDb to start. I get a Unregistered Authentication Agent for unix-process error in the logs. This is Centos 7. SELinux is disabled. Do you have any insight how I can get this to work?
          longbeard Greg Herrell added a comment -

          Using tcmalloc as the memory allocator seems to have stabilized the memory consumption.

          For anyone finding this via Google... This is a Centos 7 installation. I loaded tcmalloc via LD_PRELOAD by creating a conf file in the systemd folders.

          longbeard Greg Herrell added a comment - Using tcmalloc as the memory allocator seems to have stabilized the memory consumption. For anyone finding this via Google... This is a Centos 7 installation. I loaded tcmalloc via LD_PRELOAD by creating a conf file in the systemd folders.

          longbeard, sorry, I had overlooked your update. Because you wrote that switching to the tcmalloc allocator library fixed the problem for you, I do not think that this can be an actual memory leak, but rather fragmentation in the memory allocator.

          I will not close this as "not a bug", but "incomplete", because I believe that certain memory allocation patterns can be more likely to cause fragmentation in a memory allocator. Also, unnecessary heap memory allocation can be argued to be a bug.

          To fix the fragmentation, we would need a way to repeat it. And I understand that it could take significant effort to provide such a test harness (without disclosing any confidential data that would normally be stored in the database).

          marko Marko Mäkelä added a comment - longbeard , sorry, I had overlooked your update. Because you wrote that switching to the tcmalloc allocator library fixed the problem for you, I do not think that this can be an actual memory leak, but rather fragmentation in the memory allocator. I will not close this as "not a bug", but "incomplete", because I believe that certain memory allocation patterns can be more likely to cause fragmentation in a memory allocator. Also, unnecessary heap memory allocation can be argued to be a bug. To fix the fragmentation, we would need a way to repeat it. And I understand that it could take significant effort to provide such a test harness (without disclosing any confidential data that would normally be stored in the database).

          People

            marko Marko Mäkelä
            longbeard Greg Herrell
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.