Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4.6, 10.4.7, 10.4.8
    • 10.4.14
    • Galera, Server
    • Docker image from docker-hub, 10.4.6-bionic 3-node galera setup on a debian 10 host.
      Only one instance is receiving live queries, two are passive nodes (or used readonly for backups)
      256GB host memory
      InnoDB tables only

    Description

      MariaDB is configured for 64GB Innodb buffer_pool, which should lead to approx. 70-80GB of Memory consumption.

      Over time, this increases, sometimes in larger steps, sometimes gradually. After 47h of "uptime" we are currently at:

      top - 15:25:11 up 1 day, 22:48,  1 user,  load average: 3.28, 3.71, 3.77
      Tasks: 299 total,   1 running, 296 sleeping,   0 stopped,   2 zombie
      %Cpu(s): 10.0 us, 10.0 sy,  0.0 ni, 80.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
      MiB Mem : 257612.1 total,  84072.2 free, 163291.2 used,  10248.6 buff/cache
      MiB Swap:   4768.0 total,   4768.0 free,      0.0 used.  92118.7 avail Mem 
       
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                   
       2430 mysqld-docker  20   0  198.9g 163.5g   8.0g S 400.0  65.0  11415:05 mysqld 

      this will in the end lead to an OOM condition in some days, but after OOM-kill, the galera IST will not work, triggering MDEV-20218 and in consequence break the whole cluster, because after some tries no donor is available and all nodes are DESYNC and writing replication logs to disk.

      Unfortunately, MDEV-16431 seems not ready yet. How can I debug this?

      MariaDB [my_db]> show global status like '%memory%';
      +---------------------+-----------+
      | Variable_name       | Value     |
      +---------------------+-----------+
      | Memory_used         | 906490496 |
      | Memory_used_initial | 139670856 |
      | Qcache_free_memory  | 0         |
      +---------------------+-----------+
      3 rows in set (0.002 sec)

      BUFFER POOL AND MEMORY
      ----------------------
      Total large memory allocated 71135395840
      Dictionary memory allocated 878832
      Buffer pool size   4230336
      Free buffers       1677346
      Database pages     2430350
      Old database pages 897500
      Modified db pages  29090
      Percent of dirty pages(LRU & free pages): 0.708
      Max dirty pages percent: 75.000
      Pending reads 0
      Pending writes: LRU 0, flush list 0, single page 0
      Pages made young 29835, not young 0
      0.00 youngs/s, 0.00 non-youngs/s
      Pages read 2305059, created 125291, written 47846699
      0.00 reads/s, 0.00 creates/s, 218.39 writes/s
      Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
      Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
      LRU len: 2430350, unzip_LRU len: 0
      I/O sum[0]:cur[0], unzip sum[0]:cur[0]

      Please find the config file attached.

      The memory consumtion is probably triggered by client access, because if we redirect our loadbalancer to the next backend, memory grows there. OTOH, memory usage won't decrease when not receiving queries, even after days. (Had to cut the experiment after 3 days, because node2/3 was threatening to break down)

      Attachments

        1. chart2.png
          chart2.png
          47 kB
        2. galera-4months-swap.png
          galera-4months-swap.png
          314 kB
        3. jacob.cnf
          2 kB
        4. Screen Shot 2021-03-29 at 4.59.41 PM.png
          Screen Shot 2021-03-29 at 4.59.41 PM.png
          153 kB
        5. Screen Shot 2021-03-29 at 5.13.40 PM.png
          Screen Shot 2021-03-29 at 5.13.40 PM.png
          194 kB
        6. swap-leak-10-4-13-1.png
          swap-leak-10-4-13-1.png
          248 kB
        7. swap-zoom-10-4-13-1.png
          swap-zoom-10-4-13-1.png
          270 kB

        Issue Links

          Activity

            euglorg Eugene added a comment - - edited

            Hit very same issue. mariadb 10.4.15 with galera-26.4.5 on Gentoo, linux kernel 5.10.31.

            max_connections				= 2000
            table_open_cache			= 120000
            table_open_cache_instances              = 16
            innodb_adaptive_hash_index		= 0
            innodb_buffer_pool_size			= 64G
            innodb_log_file_size			= 8G
             
            max number of open files            10240000
            

            Used memory on mysqld grew from 160 to 240GB (within two weeks) with system malloc library.

            euglorg Eugene added a comment - - edited Hit very same issue. mariadb 10.4.15 with galera-26.4.5 on Gentoo, linux kernel 5.10.31. max_connections = 2000 table_open_cache = 120000 table_open_cache_instances = 16 innodb_adaptive_hash_index = 0 innodb_buffer_pool_size = 64G innodb_log_file_size = 8G   max number of open files 10240000 Used memory on mysqld grew from 160 to 240GB (within two weeks) with system malloc library.

            Can anyone seeing this issue confirm if they are NOT using replication?
            For replication based setups, see MDEV-26712.

            Roel Roel Van de Paar added a comment - Can anyone seeing this issue confirm if they are NOT using replication? For replication based setups, see MDEV-26712 .
            mmerz Matthias Merz added a comment -

            In my original report, I had also tried 10.4.6 without any replication, also observing memory consumption going up. At least then, this was not replication related. Also I see the mem growth only on the node(s) receiving SQL-requests, not on the replicated nodes without interactive sessions. So I'd assume the original bug to not be related to galera and at least (unfortunately) not MDEV-26712 in our case.

            OTOH it's practically impossible in our environment to reproduce this behaviour. It feels like it's gotten better (currently we are in 10.4.21), but never went away completely. And also another "single" instance on Debian buster (version 10.3.29) currently takes 196GB Resident-Set with innodb_buffer_pool_size=64G (and no MyISAM or other table types in a mentionable amount)

            mmerz Matthias Merz added a comment - In my original report, I had also tried 10.4.6 without any replication, also observing memory consumption going up. At least then, this was not replication related. Also I see the mem growth only on the node(s) receiving SQL-requests, not on the replicated nodes without interactive sessions. So I'd assume the original bug to not be related to galera and at least (unfortunately) not MDEV-26712 in our case. OTOH it's practically impossible in our environment to reproduce this behaviour. It feels like it's gotten better (currently we are in 10.4.21), but never went away completely. And also another "single" instance on Debian buster (version 10.3.29) currently takes 196GB Resident-Set with innodb_buffer_pool_size=64G (and no MyISAM or other table types in a mentionable amount)

            Thank you mmerz for the input, much appreciated.

            Roel Roel Van de Paar added a comment - Thank you mmerz for the input, much appreciated.
            xneelo shield added a comment -

            Adding some information on the topic of NOT using replication.

            Memory usage initially increase to what one expects - as assigned to the service. Then, over a time frame of 2 to 4 weeks, memory usage increases very slowly until completely depleted.

            Configuration:
            Around 215 servers used explicitly as database servers - MariaDB
            mariadb-server 1:10.4.21+maria~stretch
            Kernel: linux-image-4.19-amd64 (will be upgrading to Buster within the next few months).
            Mixture of innodb and myisam tables.
            The servers mostly host around 1800 web based databases, mostly Wordpress.

            Most of these servers have 32G RAM with a swappiness of 1, with settings:
            innodb_buffer_pool_size = 11159M
            interactive_timeout = 1800
            key_buffer_size = 3G
            max_connections = 400
            open_files_limit = 8000
            query_cache_limit = 1M
            query_cache_size = 0
            query_cache_type = OFF
            read_buffer_size = 2Mk
            sort_buffer_size = 2M
            table_open_cache = 2000
            thread_cache_size = 800
            thread_stack = 292K
            wait_timeout = 1800

            Calculating predicted RAM usage based on this query:
            SELECT ROUND( ( @@GLOBAL.key_buffer_size + @@GLOBAL.query_cache_size + @@GLOBAL.tmp_table_size + @@GLOBAL.innodb_buffer_pool_size + @@GLOBAL.aria_pagecache_buffer_size + @@GLOBAL.innodb_log_buffer_size + @@GLOBAL.max_connections * ( @@GLOBAL.sort_buffer_size + @@GLOBAL.read_buffer_size + @@GLOBAL.read_rnd_buffer_size + @@GLOBAL.join_buffer_size + @@GLOBAL.thread_stack + @@GLOBAL.binlog_cache_size) ) / 1024 / 1024, 1) `total MB`;

            ...a server with 32G is configured to use 16422.6MB.

            If you need more information, I'll gladly provide.

            xneelo shield added a comment - Adding some information on the topic of NOT using replication. Memory usage initially increase to what one expects - as assigned to the service. Then, over a time frame of 2 to 4 weeks, memory usage increases very slowly until completely depleted. Configuration: Around 215 servers used explicitly as database servers - MariaDB mariadb-server 1:10.4.21+maria~stretch Kernel: linux-image-4.19-amd64 (will be upgrading to Buster within the next few months). Mixture of innodb and myisam tables. The servers mostly host around 1800 web based databases, mostly Wordpress. Most of these servers have 32G RAM with a swappiness of 1, with settings: innodb_buffer_pool_size = 11159M interactive_timeout = 1800 key_buffer_size = 3G max_connections = 400 open_files_limit = 8000 query_cache_limit = 1M query_cache_size = 0 query_cache_type = OFF read_buffer_size = 2Mk sort_buffer_size = 2M table_open_cache = 2000 thread_cache_size = 800 thread_stack = 292K wait_timeout = 1800 Calculating predicted RAM usage based on this query: SELECT ROUND( ( @@GLOBAL.key_buffer_size + @@GLOBAL.query_cache_size + @@GLOBAL.tmp_table_size + @@GLOBAL.innodb_buffer_pool_size + @@GLOBAL.aria_pagecache_buffer_size + @@GLOBAL.innodb_log_buffer_size + @@GLOBAL.max_connections * ( @@GLOBAL.sort_buffer_size + @@GLOBAL.read_buffer_size + @@GLOBAL.read_rnd_buffer_size + @@GLOBAL.join_buffer_size + @@GLOBAL.thread_stack + @@GLOBAL.binlog_cache_size) ) / 1024 / 1024, 1) `total MB`; ...a server with 32G is configured to use 16422.6MB. If you need more information, I'll gladly provide.

            People

              sanja Oleksandr Byelkin
              mmerz Matthias Merz
              Votes:
              6 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.