Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20698

Master slowly running out of memory and gets killed by oom-killer

Details

    • Bug
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.3.17
    • None
    • Server
    • CentOS 7 (3.10.0-1062.1.1.el7.x86_64)

    Description

      Hi guys,

      we experience a serious problem since we upgraded from MariaDB 10.1.25 to MariaDB 10.3.17 about two weeks ago.

      The memory consumption of our master server slowly increases over time for no obvious reason and then the service gets killed by the oom-killer. This timespan is only about one day! Like for example today at 9:00 AM, the memory consumption of mysqld was 1,952 MB, while about two hours later (at 11:13 AM), it hit 3,484 MB (that's about 60% of available RAM). Even though the innodb_buffer_pool_size is limited to 1 GB.
      To prevent the master from getting killed by oom-killer, I manually restart the MariaDB service every evening.

      What's weird about this is, that before the upgrade the cluster consisting of one master and one slave worked like a charm. Even though we have a cluster controlled by MaxScale, which performs automatic failover if the master gets killed, this is not very pleasing if it happens regularily while people are working.

      Does anybody have an idea where this could suddenly come from or how to find out what's going on? Is the database corrupt?

      If you need more information or details, I will provide you with as much info as I can.

      I really hope that you guys can help me, as I absolutely ran out of ideas.

      Thank you in advance!

      Regards,
      matze

      Attachments

        Issue Links

          Activity

            matze HI. If you have a simple memory leaks from malloc() or indirect calls to malloc() by libcurl or something similar, than you can easily catch those without recompiling MariaDB or your plugins using TCMalloc by Google. You can plug it in through LD_PRELOAD and use as described here http://goog-perftools.sourceforge.net/doc/heap_checker.html

            I general, I doubt that such ubiquitous libraries like glibc or libcurl have any memory leaks.

            MariaDB doesn't have any 'simple' memory leaks which can be detected by automatic tools. Different memory arenas exists in MariaDB. F.ex, triggers, stored procedures and prepared statements are stored in such arenas. And there is no simple way to detect leaks in such arenas. And I think you may have such a leak. To find this you need to run MariaDB for sufficient time and gain statistic on memory allocations from different function calls. Again, TCMalloc can collect such statistics. I think --inuse_space from https://gperftools.github.io/gperftools/heapprofile.html best suites our needs.

            I have no idea for how long you need to collect memory statistics to get enough of it. You may checks TCMalloc output by yourself until you find something suspicious.

            kevg Eugene Kosov (Inactive) added a comment - matze HI. If you have a simple memory leaks from malloc() or indirect calls to malloc() by libcurl or something similar, than you can easily catch those without recompiling MariaDB or your plugins using TCMalloc by Google. You can plug it in through LD_PRELOAD and use as described here http://goog-perftools.sourceforge.net/doc/heap_checker.html I general, I doubt that such ubiquitous libraries like glibc or libcurl have any memory leaks. MariaDB doesn't have any 'simple' memory leaks which can be detected by automatic tools. Different memory arenas exists in MariaDB. F.ex, triggers, stored procedures and prepared statements are stored in such arenas. And there is no simple way to detect leaks in such arenas. And I think you may have such a leak. To find this you need to run MariaDB for sufficient time and gain statistic on memory allocations from different function calls. Again, TCMalloc can collect such statistics. I think --inuse_space from https://gperftools.github.io/gperftools/heapprofile.html best suites our needs. I have no idea for how long you need to collect memory statistics to get enough of it. You may checks TCMalloc output by yourself until you find something suspicious.
            matze Matthias added a comment -

            Hi Eugene,

            thank you for your answer. I'm going to try out the tools you mentioned as soon as possible.
            But this weekend, I'm going to recompile our plugins, drop and recreate the functions and observe the memory consumption on monday.
            I also absolutely doubt that glibc and libcurl themselfs have memory leaks. But could it be possible that after the update of glibc and libcurl, our plugins (which were at that point still compiled against the former version of these libraries) may leak memory because something inside glibc (like function entry points) changed, which our plugins aren't aware of?
            This is just a guess...

            Thanks and regards,
            Matthias

            matze Matthias added a comment - Hi Eugene, thank you for your answer. I'm going to try out the tools you mentioned as soon as possible. But this weekend, I'm going to recompile our plugins, drop and recreate the functions and observe the memory consumption on monday. I also absolutely doubt that glibc and libcurl themselfs have memory leaks. But could it be possible that after the update of glibc and libcurl, our plugins (which were at that point still compiled against the former version of these libraries) may leak memory because something inside glibc (like function entry points) changed, which our plugins aren't aware of? This is just a guess... Thanks and regards, Matthias

            As I understand binary compatibility it means that you can use every minor library with your the same compiled application. And if there is no bugs in your application or library there should be no memory leaks. Well, maybe it's possible to use libc functions in a non-standard way and depend on undocumented behaviour, but I don't expect it's common.

            Anyway, Given enough use, there is no such thing as a private implementation
            This is quote from https://www.hyrumslaw.com/

            kevg Eugene Kosov (Inactive) added a comment - As I understand binary compatibility it means that you can use every minor library with your the same compiled application. And if there is no bugs in your application or library there should be no memory leaks. Well, maybe it's possible to use libc functions in a non-standard way and depend on undocumented behaviour, but I don't expect it's common. Anyway, Given enough use, there is no such thing as a private implementation This is quote from https://www.hyrumslaw.com/
            matze Matthias added a comment -

            Hi Eugene,

            what you said about binary compatibility sounds right. Still I recompiled two of our three plugins and as of today it seems like nothing has changed. I don't think I use libc functions in an non-standard way, as I tried to be very careful writing this one plugin. But if you like, I can attach the source code of those two plugins I recompiled. The third plugin is hosted on Github (https://github.com/ssimicro/lib_mysqludf_amqp).

            Yesterday, I installed TCMalloc (gperftools) on our test system and did some tests with it to get a feeling for this tool.
            Because our test system does not seem to be affected by this memory leak unlike our production server, I tried something else to produce a high memory consumption.
            As I described in this issue (https://jira.mariadb.org/browse/MDEV-20699), the memory is also increasing very fast when calling SHOW CREATE PROCEDURE, leading to the server getting killed by oom-killer sooner or later.
            I will attach the profiling files and a SVG created with pprof. Unfortunately, there are no function names but only addresses, so I can't tell what part of mysqld uses how much memory. What am I doing wrong? Do you know how to get the function names? Do I need the debug build of MariaDB?
            Please also see the attached systemd service definition that sets the corresponding environment and preloads libtcmalloc.

            I hope this helps a little bit.

            Regards,
            Matthias

            matze Matthias added a comment - Hi Eugene, what you said about binary compatibility sounds right. Still I recompiled two of our three plugins and as of today it seems like nothing has changed. I don't think I use libc functions in an non-standard way, as I tried to be very careful writing this one plugin. But if you like, I can attach the source code of those two plugins I recompiled. The third plugin is hosted on Github ( https://github.com/ssimicro/lib_mysqludf_amqp ). Yesterday, I installed TCMalloc (gperftools) on our test system and did some tests with it to get a feeling for this tool. Because our test system does not seem to be affected by this memory leak unlike our production server, I tried something else to produce a high memory consumption. As I described in this issue ( https://jira.mariadb.org/browse/MDEV-20699 ), the memory is also increasing very fast when calling SHOW CREATE PROCEDURE, leading to the server getting killed by oom-killer sooner or later. I will attach the profiling files and a SVG created with pprof. Unfortunately, there are no function names but only addresses, so I can't tell what part of mysqld uses how much memory. What am I doing wrong? Do you know how to get the function names? Do I need the debug build of MariaDB? Please also see the attached systemd service definition that sets the corresponding environment and preloads libtcmalloc. I hope this helps a little bit. Regards, Matthias
            danblack Daniel Black added a comment -

            Suggest looking the the bcc-tools memleak (example usage and output in MDEV-22809).

            danblack Daniel Black added a comment - Suggest looking the the bcc-tools memleak (example usage and output in MDEV-22809 ).

            People

              Unassigned Unassigned
              matze Matthias
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.