Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35700

Consumes more and more memory until OOM. Consistently

Details

    Description

      We have a cluster of 3 nodes, one of which gets the lions share of requests and runs some aggregation and reporting queries every morning. We have noticed that recent version (not sure when it really started to be an issue) seem to leak memory which is then terminated by OOM killer. We did tame OOM a bit but it genuinely runs out of memory.

      Looking at the graph below, we can see that the memory consumed each day is not given back and accumulates until it stops.

      [server]
       
      [mysqld]
       
      basedir                 = /usr
      bind-address            = 0.0.0.0
      bulk_insert_buffer_size = 16M
      character-set-server    = utf8mb4
      collation-server        = utf8mb4_general_ci
      connect_timeout         = 5
      datadir                 = /var/lib/mysql
      default_storage_engine  = InnoDB
      expire_logs_days        = 3
      group_concat_max_len    = 1000000
       
      innodb_buffer_pool_instances    = 5
      innodb_buffer_pool_size         =  5G
      innodb_change_buffer_max_size   = 50
      innodb_file_format      = Barracuda
      innodb_file_per_table   = 1
      innodb_flush_method     = O_DIRECT
      innodb_io_capacity      = 400
      innodb_large_prefix     = on
      innodb_log_buffer_size  = 1G
      innodb_log_file_size    = 640M
      innodb_log_files_in_group       = 2
      innodb_open_files       = 400
      innodb_read_io_threads  = 16
      innodb_thread_concurrency       = 8
      innodb_write_io_threads = 16
       
      join_buffer_size        = 2M
      #key_buffer_size         = 16M
      lc-messages-dir         = /usr/share/mysql
      log_bin                 = /var/log/mysql/mariadb-bin
      log_bin_index           = /var/log/mysql/mariadb-bin.index
      log_error               = /var/log/mysql/error.log
      # https://docs.rackspace.com/support/how-to/mariadb-error-log-configuration/
      log_warnings            = 1
       
      max_allowed_packet      = 256M
      max_binlog_size         = 512M
      max_connections         = 400
      max_heap_table_size     = 64M
      myisam_recover_options  = BACKUP
      performance_schema      = on
      pid-file                = /var/run/mysqld/mysqld.pid
      port                    = 3306
      query_cache_limit       = 128K
      query_cache_size        = 64M
      skip-external-locking
      skip-name-resolve       = 1
      socket                  =  /var/run/mysqld/mysqld.sock
      sort_buffer_size        =  4M
      table_definition_cache  = 16384
      table_open_cache        = 16384
      thread_cache_size       = 128
      thread_cache_size       = 8
      thread_stack            = 192K
      tmp_table_size          = 256M
      tmpdir                  = /tmp
      user                    = mysql
      userstat                = 1
      wait_timeout            = 600
       
      [embedded]
       
      [mariadb]
      plugin_load_add         = ha_federatedx
      [mariadb-10.1]
      
      

      You can see that at 6am every day it takes a bite of memory and never releases it.
      During that time, there are many rows updated. We use it for an inventory system where it crunches the previous days opening stock, then all the relevant transactions to produce a closing stock - ultimately though this is just a bunch of basic joins that inserts into a table. 1.9M rows per day and no fancy statements all just select / inserts. I would have thought that at no point should selects and inserts no matter how they were written would cause a memory leak over time.

      Attachments

        1. image-2024-12-20-08-19-52-630.png
          image-2024-12-20-08-19-52-630.png
          594 kB
        2. screenshot-1.png
          screenshot-1.png
          542 kB
        3. screenshot-2.png
          screenshot-2.png
          240 kB
        4. screenshot-3.png
          screenshot-3.png
          486 kB
        5. screenshot-4.png
          screenshot-4.png
          407 kB
        6. screenshot-5.png
          screenshot-5.png
          544 kB

        Issue Links

          Activity

            DrJaymz James Cross added a comment - - edited

            I think this is correct. The memory even gets freed but unusable until its all unusable.
            I do have libre nms which was monitoring the database nodes and it does some memory logging for mariadb specifically and that appears to show a relatively flat allocation which is what we expect to see, but its not showing me the full memory load of the mysql process which I think is what we wanted to see.

            You can see though that jemalloc makes quite a big difference, I suspect that fragmentation will never be completely eliminated but its substantially better. We do thrash the database hard on that node for the reporting and we have another which contains retail till records for 350 shops which is also very very busy which suffered the same fate every month or so regardless of how much memory you give it. I have changed that one too. That one would be out of memory when doing a mysqldump so no related to any modification of data at all.

            DrJaymz James Cross added a comment - - edited I think this is correct. The memory even gets freed but unusable until its all unusable. I do have libre nms which was monitoring the database nodes and it does some memory logging for mariadb specifically and that appears to show a relatively flat allocation which is what we expect to see, but its not showing me the full memory load of the mysql process which I think is what we wanted to see. You can see though that jemalloc makes quite a big difference, I suspect that fragmentation will never be completely eliminated but its substantially better. We do thrash the database hard on that node for the reporting and we have another which contains retail till records for 350 shops which is also very very busy which suffered the same fate every month or so regardless of how much memory you give it. I have changed that one too. That one would be out of memory when doing a mysqldump so no related to any modification of data at all.
            DrJaymz James Cross added a comment -

            Working much better - so the only change was to change the memory manager. Surely I can't be the only one with a busy database and that everyone else must have the same issue unless they are using the windows version where fragmentation doesn't happen.

            -James

            DrJaymz James Cross added a comment - Working much better - so the only change was to change the memory manager. Surely I can't be the only one with a busy database and that everyone else must have the same issue unless they are using the windows version where fragmentation doesn't happen. -James

            No, you're not the only one. But it really depends on the workload, "busy database" is not enough, actual sql statements matter. There is a couple of more issues in Jira about this. And we are making changes that are supposed to make memory fragmentation less probable. But nevertheless only a small fraction of installations is affected and we still cannot repeat it internally.

            Shall we close this issue or do you want to help to track it down, to pinpoint what exactly causes the fragmentation?

            serg Sergei Golubchik added a comment - No, you're not the only one. But it really depends on the workload, "busy database" is not enough, actual sql statements matter. There is a couple of more issues in Jira about this. And we are making changes that are supposed to make memory fragmentation less probable. But nevertheless only a small fraction of installations is affected and we still cannot repeat it internally. Shall we close this issue or do you want to help to track it down, to pinpoint what exactly causes the fragmentation?
            DrJaymz James Cross added a comment -

            If I can help - I will.

            Every day we have a perpetual inventory task, which inserts directly from a select that joins across 3 or so large databases which are partitioned on yearweek. It inserts around 11M rows a day and we drop the oldest yearweek from a year or so as we go. That task for sure grabs a chunk of memory that it doesn't give up. Its clustered database but only the node running that task is affected by memory becoming used. This doesn't have any blobs or text, for the most part the data is integers.

            On another cluster we have a partitioned journal which has a large set of inserts through the day every 20 mins or so as our outposts report back. But the memory doesn't change much. Previously it would grab 1-2Gb a day and not give it back (left). Now (right) it grabs that holds on to it for 4 hours or so and then we get it back. We don't have any long running tasks that run that long, only wsrep runs continually. The blocks of 4 hours don't coincide with any queries that are unique to those times. Which makes it look like its a housekeeping task that the database is doing itself.

            Let me know if I can provide anything else. Obviously I'd not want to revert the config.

            DrJaymz James Cross added a comment - If I can help - I will. Every day we have a perpetual inventory task, which inserts directly from a select that joins across 3 or so large databases which are partitioned on yearweek. It inserts around 11M rows a day and we drop the oldest yearweek from a year or so as we go. That task for sure grabs a chunk of memory that it doesn't give up. Its clustered database but only the node running that task is affected by memory becoming used. This doesn't have any blobs or text, for the most part the data is integers. On another cluster we have a partitioned journal which has a large set of inserts through the day every 20 mins or so as our outposts report back. But the memory doesn't change much. Previously it would grab 1-2Gb a day and not give it back (left). Now (right) it grabs that holds on to it for 4 hours or so and then we get it back. We don't have any long running tasks that run that long, only wsrep runs continually. The blocks of 4 hours don't coincide with any queries that are unique to those times. Which makes it look like its a housekeeping task that the database is doing itself. Let me know if I can provide anything else. Obviously I'd not want to revert the config.
            DrJaymz James Cross added a comment - - edited

            I saw this was updated. I can provide additional feedback:

            Since using jemalloc in week51 of last year, the memory leak is gone and memory management is stable with no leaking. The difference is fairly dramatic and makes you wonder what malloc is playing at.

            DrJaymz James Cross added a comment - - edited I saw this was updated. I can provide additional feedback: Since using jemalloc in week51 of last year, the memory leak is gone and memory management is stable with no leaking. The difference is fairly dramatic and makes you wonder what malloc is playing at.

            People

              Unassigned Unassigned
              DrJaymz James Cross
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.