Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4.6, 10.4.7, 10.4.8
    • 10.4.14
    • Galera, Server
    • Docker image from docker-hub, 10.4.6-bionic 3-node galera setup on a debian 10 host.
      Only one instance is receiving live queries, two are passive nodes (or used readonly for backups)
      256GB host memory
      InnoDB tables only

    Description

      MariaDB is configured for 64GB Innodb buffer_pool, which should lead to approx. 70-80GB of Memory consumption.

      Over time, this increases, sometimes in larger steps, sometimes gradually. After 47h of "uptime" we are currently at:

      top - 15:25:11 up 1 day, 22:48,  1 user,  load average: 3.28, 3.71, 3.77
      Tasks: 299 total,   1 running, 296 sleeping,   0 stopped,   2 zombie
      %Cpu(s): 10.0 us, 10.0 sy,  0.0 ni, 80.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
      MiB Mem : 257612.1 total,  84072.2 free, 163291.2 used,  10248.6 buff/cache
      MiB Swap:   4768.0 total,   4768.0 free,      0.0 used.  92118.7 avail Mem 
       
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                   
       2430 mysqld-docker  20   0  198.9g 163.5g   8.0g S 400.0  65.0  11415:05 mysqld 

      this will in the end lead to an OOM condition in some days, but after OOM-kill, the galera IST will not work, triggering MDEV-20218 and in consequence break the whole cluster, because after some tries no donor is available and all nodes are DESYNC and writing replication logs to disk.

      Unfortunately, MDEV-16431 seems not ready yet. How can I debug this?

      MariaDB [my_db]> show global status like '%memory%';
      +---------------------+-----------+
      | Variable_name       | Value     |
      +---------------------+-----------+
      | Memory_used         | 906490496 |
      | Memory_used_initial | 139670856 |
      | Qcache_free_memory  | 0         |
      +---------------------+-----------+
      3 rows in set (0.002 sec)

      BUFFER POOL AND MEMORY
      ----------------------
      Total large memory allocated 71135395840
      Dictionary memory allocated 878832
      Buffer pool size   4230336
      Free buffers       1677346
      Database pages     2430350
      Old database pages 897500
      Modified db pages  29090
      Percent of dirty pages(LRU & free pages): 0.708
      Max dirty pages percent: 75.000
      Pending reads 0
      Pending writes: LRU 0, flush list 0, single page 0
      Pages made young 29835, not young 0
      0.00 youngs/s, 0.00 non-youngs/s
      Pages read 2305059, created 125291, written 47846699
      0.00 reads/s, 0.00 creates/s, 218.39 writes/s
      Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
      Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
      LRU len: 2430350, unzip_LRU len: 0
      I/O sum[0]:cur[0], unzip sum[0]:cur[0]

      Please find the config file attached.

      The memory consumtion is probably triggered by client access, because if we redirect our loadbalancer to the next backend, memory grows there. OTOH, memory usage won't decrease when not receiving queries, even after days. (Had to cut the experiment after 3 days, because node2/3 was threatening to break down)

      Attachments

        1. jacob.cnf
          2 kB
        2. swap-leak-10-4-13-1.png
          swap-leak-10-4-13-1.png
          248 kB
        3. swap-zoom-10-4-13-1.png
          swap-zoom-10-4-13-1.png
          270 kB
        4. galera-4months-swap.png
          galera-4months-swap.png
          314 kB
        5. Screen Shot 2021-03-29 at 4.59.41 PM.png
          Screen Shot 2021-03-29 at 4.59.41 PM.png
          153 kB
        6. Screen Shot 2021-03-29 at 5.13.40 PM.png
          Screen Shot 2021-03-29 at 5.13.40 PM.png
          194 kB
        7. chart2.png
          chart2.png
          47 kB

        Issue Links

          Activity

            mmerz Matthias Merz created issue -
            mmerz Matthias Merz made changes -
            Field Original Value New Value
            mmerz Matthias Merz made changes -
            mmerz Matthias Merz made changes -
            Description MariaDB is configured for 64GB Innodb buffer_pool, which should lead to approx. 70-80GB of Memory consumption.

            Over time, this increases, sometimes in larger steps, sometimes gradually. After 47h of "uptime" we are currently at:

            {quote}top - 15:25:11 up 1 day, 22:48, 1 user, load average: 3.28, 3.71, 3.77
            Tasks: 299 total, 1 running, 296 sleeping, 0 stopped, 2 zombie
            %Cpu(s): 10.0 us, 10.0 sy, 0.0 ni, 80.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
            MiB Mem : 257612.1 total, 84072.2 free, 163291.2 used, 10248.6 buff/cache
            MiB Swap: 4768.0 total, 4768.0 free, 0.0 used. 92118.7 avail Mem

              PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
             2430 mysqld-docker 20 0 198.9g 163.5g 8.0g S 400.0 65.0 11415:05 mysqld {quote}

            this will in the end lead to an OOM condition in some days, but after OOM-kill, the galera IST will not work, triggering https://jira.mariadb.org/browse/MDEV-20218 and in consequence break the whole cluster, because after some tries no donor is available and all nodes are DESYNC and writing replication logs to disk.

            Unfortunately, https://jira.mariadb.org/browse/MDEV-16431 seems not ready yet. How can I debug this?

            {quote}MariaDB [my_db]> show global status like '%memory%';
            +---------------------+-----------+
            | Variable_name | Value |
            +---------------------+-----------+
            | Memory_used | 906490496 |
            | Memory_used_initial | 139670856 |
            | Qcache_free_memory | 0 |
            +---------------------+-----------+
            3 rows in set (0.002 sec){quote}

            {quote}BUFFER POOL AND MEMORY
            ----------------------
            Total large memory allocated 71135395840
            Dictionary memory allocated 878832
            Buffer pool size 4230336
            Free buffers 1677346
            Database pages 2430350
            Old database pages 897500
            Modified db pages 29090
            Percent of dirty pages(LRU & free pages): 0.708
            Max dirty pages percent: 75.000
            Pending reads 0
            Pending writes: LRU 0, flush list 0, single page 0
            Pages made young 29835, not young 0
            0.00 youngs/s, 0.00 non-youngs/s
            Pages read 2305059, created 125291, written 47846699
            0.00 reads/s, 0.00 creates/s, 218.39 writes/s
            Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
            Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
            LRU len: 2430350, unzip_LRU len: 0
            I/O sum[0]:cur[0], unzip sum[0]:cur[0]{quote}

            Please find the config file attached.
            MariaDB is configured for 64GB Innodb buffer_pool, which should lead to approx. 70-80GB of Memory consumption.

            Over time, this increases, sometimes in larger steps, sometimes gradually. After 47h of "uptime" we are currently at:

            {quote}top - 15:25:11 up 1 day, 22:48, 1 user, load average: 3.28, 3.71, 3.77
            Tasks: 299 total, 1 running, 296 sleeping, 0 stopped, 2 zombie
            %Cpu(s): 10.0 us, 10.0 sy, 0.0 ni, 80.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
            MiB Mem : 257612.1 total, 84072.2 free, 163291.2 used, 10248.6 buff/cache
            MiB Swap: 4768.0 total, 4768.0 free, 0.0 used. 92118.7 avail Mem

              PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
             2430 mysqld-docker 20 0 198.9g 163.5g 8.0g S 400.0 65.0 11415:05 mysqld {quote}

            this will in the end lead to an OOM condition in some days, but after OOM-kill, the galera IST will not work, triggering https://jira.mariadb.org/browse/MDEV-20218 and in consequence break the whole cluster, because after some tries no donor is available and all nodes are DESYNC and writing replication logs to disk.

            Unfortunately, https://jira.mariadb.org/browse/MDEV-16431 seems not ready yet. How can I debug this?

            {quote}MariaDB [my_db]> show global status like '%memory%';
            +---------------------+-----------+
            | Variable_name | Value |
            +---------------------+-----------+
            | Memory_used | 906490496 |
            | Memory_used_initial | 139670856 |
            | Qcache_free_memory | 0 |
            +---------------------+-----------+
            3 rows in set (0.002 sec){quote}

            {quote}BUFFER POOL AND MEMORY
            ----------------------
            Total large memory allocated 71135395840
            Dictionary memory allocated 878832
            Buffer pool size 4230336
            Free buffers 1677346
            Database pages 2430350
            Old database pages 897500
            Modified db pages 29090
            Percent of dirty pages(LRU & free pages): 0.708
            Max dirty pages percent: 75.000
            Pending reads 0
            Pending writes: LRU 0, flush list 0, single page 0
            Pages made young 29835, not young 0
            0.00 youngs/s, 0.00 non-youngs/s
            Pages read 2305059, created 125291, written 47846699
            0.00 reads/s, 0.00 creates/s, 218.39 writes/s
            Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
            Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
            LRU len: 2430350, unzip_LRU len: 0
            I/O sum[0]:cur[0], unzip sum[0]:cur[0]{quote}

            Please find the config file attached.

            The memory consumtion is probably triggered by client access, because if we redirect our loadbalancer to the next backend, memory grows there. OTOH, memory usage won't decrease when not receiving queries, even after days. (Had to cut the experiment after 3 days, because node2/3 was threatening to break down)
            elenst Elena Stepanova made changes -
            Fix Version/s 10.4 [ 22408 ]
            Assignee Jan Lindström [ jplindst ]
            mmerz Matthias Merz made changes -
            Labels Memory_leak galera Memory_leak
            mmerz Matthias Merz made changes -
            Affects Version/s 10.4.8 [ 23721 ]
            Affects Version/s 10.4.7 [ 23720 ]
            mmerz Matthias Merz added a comment -

            I was also able to reproduce this memory leak without any galera plugin loaded, so the problem seems to come from MariaDB itself.
            In the meantime, I also tested versions 10.4.7 and am currently running 10.4.8 without notable changes in memory profile.

            mmerz Matthias Merz added a comment - I was also able to reproduce this memory leak without any galera plugin loaded, so the problem seems to come from MariaDB itself. In the meantime, I also tested versions 10.4.7 and am currently running 10.4.8 without notable changes in memory profile.
            mmerz Matthias Merz made changes -
            Summary memory leak in 10.4.6-galera? memory leak in 10.4 series
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Elena Stepanova [ elenst ]
            elenst Elena Stepanova made changes -
            Assignee Elena Stepanova [ elenst ] Sergey Vojtovich [ svoj ]
            elenst Elena Stepanova made changes -
            Component/s Server [ 13907 ]
            Component/s Galera [ 10124 ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Sergey Vojtovich [ svoj ] Oleksandr Byelkin [ sanja ]
            dbteam dbteam added a comment -

            Does this happen with version MariaDB 10.4.13?

            dbteam dbteam added a comment - Does this happen with version MariaDB 10.4.13?
            mmerz Matthias Merz added a comment -

            yes, still happens with 10.4.13, but unfortunately I still cannot reproduce it. Just happens in some rare access patterns, so probably some sort of concurrency issue (we have many short-lived connections, mixed with some larger transactions)

            BUFFER POOL AND MEMORY
            ----------------------
            Total large memory allocated 71135395840
            Dictionary memory allocated 921840
            Buffer pool size 4231920
            Free buffers 24585
            Database pages 4021115
            Old database pages 1483867
            Modified db pages 40955
            Percent of dirty pages(LRU & free pages): 1.012
            Max dirty pages percent: 75.000
            Pending reads 0
            Pending writes: LRU 0, flush list 0, single page 0
            Pages made young 42401637, not young 73201887
            135.22 youngs/s, 0.50 non-youngs/s
            Pages read 4681517, created 1817539, written 812815043
            0.25 reads/s, 1.25 creates/s, 3441.39 writes/s
            Buffer pool hit rate 999 / 1000, young-making rate 0 / 1000 not 0 / 1000
            Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
            LRU len: 4021115, unzip_LRU len: 0
            I/O sum[2960712]:cur[0], unzip sum[0]:cur[0]

            This seems not to match a RAM RSS of 140GB, or am I completely mistaken?

            $ ps auxww | grep -e 29283 -e USER
            USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
            root 26460 0.0 0.0 6408 892 pts/0 S+ 10:38 0:00 grep -e 29283 -e USER
            docker-+ 29283 369 55.0 149067740 145335972 ? Ssl May28 184342:54 mysqld

            mmerz Matthias Merz added a comment - yes, still happens with 10.4.13, but unfortunately I still cannot reproduce it. Just happens in some rare access patterns, so probably some sort of concurrency issue (we have many short-lived connections, mixed with some larger transactions) BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 71135395840 Dictionary memory allocated 921840 Buffer pool size 4231920 Free buffers 24585 Database pages 4021115 Old database pages 1483867 Modified db pages 40955 Percent of dirty pages(LRU & free pages): 1.012 Max dirty pages percent: 75.000 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 42401637, not young 73201887 135.22 youngs/s, 0.50 non-youngs/s Pages read 4681517, created 1817539, written 812815043 0.25 reads/s, 1.25 creates/s, 3441.39 writes/s Buffer pool hit rate 999 / 1000, young-making rate 0 / 1000 not 0 / 1000 Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 4021115, unzip_LRU len: 0 I/O sum [2960712] :cur [0] , unzip sum [0] :cur [0] This seems not to match a RAM RSS of 140GB, or am I completely mistaken? $ ps auxww | grep -e 29283 -e USER USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 26460 0.0 0.0 6408 892 pts/0 S+ 10:38 0:00 grep -e 29283 -e USER docker-+ 29283 369 55.0 149067740 145335972 ? Ssl May28 184342:54 mysqld

            are there some more specific info about queries and so on, now I do not see something to work with (how to repeat)?

            sanja Oleksandr Byelkin added a comment - are there some more specific info about queries and so on, now I do not see something to work with (how to repeat)?
            mmerz Matthias Merz added a comment -

            That was my original question

            I know, this is very hard to debug, as long as we cannot reproduce it. It seems to be triggered by some access pattern in our live workloads, nothing to reproduce on a test system.

            That's why I had mainly asked about how to find out, in which allocation area the memory might be consumed. As you can see, the buffer pool looks OK, but there's a huge amount of RAM not showing up in the status variables which I had checked. So where in server status can that info be retrieved?

            mmerz Matthias Merz added a comment - That was my original question I know, this is very hard to debug, as long as we cannot reproduce it. It seems to be triggered by some access pattern in our live workloads, nothing to reproduce on a test system. That's why I had mainly asked about how to find out, in which allocation area the memory might be consumed. As you can see, the buffer pool looks OK, but there's a huge amount of RAM not showing up in the status variables which I had checked. So where in server status can that info be retrieved?
            serg Sergei Golubchik made changes -
            Description MariaDB is configured for 64GB Innodb buffer_pool, which should lead to approx. 70-80GB of Memory consumption.

            Over time, this increases, sometimes in larger steps, sometimes gradually. After 47h of "uptime" we are currently at:

            {quote}top - 15:25:11 up 1 day, 22:48, 1 user, load average: 3.28, 3.71, 3.77
            Tasks: 299 total, 1 running, 296 sleeping, 0 stopped, 2 zombie
            %Cpu(s): 10.0 us, 10.0 sy, 0.0 ni, 80.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
            MiB Mem : 257612.1 total, 84072.2 free, 163291.2 used, 10248.6 buff/cache
            MiB Swap: 4768.0 total, 4768.0 free, 0.0 used. 92118.7 avail Mem

              PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
             2430 mysqld-docker 20 0 198.9g 163.5g 8.0g S 400.0 65.0 11415:05 mysqld {quote}

            this will in the end lead to an OOM condition in some days, but after OOM-kill, the galera IST will not work, triggering https://jira.mariadb.org/browse/MDEV-20218 and in consequence break the whole cluster, because after some tries no donor is available and all nodes are DESYNC and writing replication logs to disk.

            Unfortunately, https://jira.mariadb.org/browse/MDEV-16431 seems not ready yet. How can I debug this?

            {quote}MariaDB [my_db]> show global status like '%memory%';
            +---------------------+-----------+
            | Variable_name | Value |
            +---------------------+-----------+
            | Memory_used | 906490496 |
            | Memory_used_initial | 139670856 |
            | Qcache_free_memory | 0 |
            +---------------------+-----------+
            3 rows in set (0.002 sec){quote}

            {quote}BUFFER POOL AND MEMORY
            ----------------------
            Total large memory allocated 71135395840
            Dictionary memory allocated 878832
            Buffer pool size 4230336
            Free buffers 1677346
            Database pages 2430350
            Old database pages 897500
            Modified db pages 29090
            Percent of dirty pages(LRU & free pages): 0.708
            Max dirty pages percent: 75.000
            Pending reads 0
            Pending writes: LRU 0, flush list 0, single page 0
            Pages made young 29835, not young 0
            0.00 youngs/s, 0.00 non-youngs/s
            Pages read 2305059, created 125291, written 47846699
            0.00 reads/s, 0.00 creates/s, 218.39 writes/s
            Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
            Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
            LRU len: 2430350, unzip_LRU len: 0
            I/O sum[0]:cur[0], unzip sum[0]:cur[0]{quote}

            Please find the config file attached.

            The memory consumtion is probably triggered by client access, because if we redirect our loadbalancer to the next backend, memory grows there. OTOH, memory usage won't decrease when not receiving queries, even after days. (Had to cut the experiment after 3 days, because node2/3 was threatening to break down)
            MariaDB is configured for 64GB Innodb buffer_pool, which should lead to approx. 70-80GB of Memory consumption.

            Over time, this increases, sometimes in larger steps, sometimes gradually. After 47h of "uptime" we are currently at:

            {noformat}top - 15:25:11 up 1 day, 22:48, 1 user, load average: 3.28, 3.71, 3.77
            Tasks: 299 total, 1 running, 296 sleeping, 0 stopped, 2 zombie
            %Cpu(s): 10.0 us, 10.0 sy, 0.0 ni, 80.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
            MiB Mem : 257612.1 total, 84072.2 free, 163291.2 used, 10248.6 buff/cache
            MiB Swap: 4768.0 total, 4768.0 free, 0.0 used. 92118.7 avail Mem

              PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
             2430 mysqld-docker 20 0 198.9g 163.5g 8.0g S 400.0 65.0 11415:05 mysqld {noformat}

            this will in the end lead to an OOM condition in some days, but after OOM-kill, the galera IST will not work, triggering MDEV-20218 and in consequence break the whole cluster, because after some tries no donor is available and all nodes are DESYNC and writing replication logs to disk.

            Unfortunately, MDEV-16431 seems not ready yet. How can I debug this?

            {noformat}MariaDB [my_db]> show global status like '%memory%';
            +---------------------+-----------+
            | Variable_name | Value |
            +---------------------+-----------+
            | Memory_used | 906490496 |
            | Memory_used_initial | 139670856 |
            | Qcache_free_memory | 0 |
            +---------------------+-----------+
            3 rows in set (0.002 sec){noformat}

            {noformat}BUFFER POOL AND MEMORY
            ----------------------
            Total large memory allocated 71135395840
            Dictionary memory allocated 878832
            Buffer pool size 4230336
            Free buffers 1677346
            Database pages 2430350
            Old database pages 897500
            Modified db pages 29090
            Percent of dirty pages(LRU & free pages): 0.708
            Max dirty pages percent: 75.000
            Pending reads 0
            Pending writes: LRU 0, flush list 0, single page 0
            Pages made young 29835, not young 0
            0.00 youngs/s, 0.00 non-youngs/s
            Pages read 2305059, created 125291, written 47846699
            0.00 reads/s, 0.00 creates/s, 218.39 writes/s
            Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
            Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
            LRU len: 2430350, unzip_LRU len: 0
            I/O sum[0]:cur[0], unzip sum[0]:cur[0]{noformat}

            Please find the config file attached.

            The memory consumtion is probably triggered by client access, because if we redirect our loadbalancer to the next backend, memory grows there. OTOH, memory usage won't decrease when not receiving queries, even after days. (Had to cut the experiment after 3 days, because node2/3 was threatening to break down)
            MattBianco Mattias Bergvall made changes -
            Attachment swap-leak-10-4-13-1.png [ 52989 ]
            MattBianco Mattias Bergvall made changes -
            Attachment swap-zoom-10-4-13-1.png [ 52990 ]
            MattBianco Mattias Bergvall added a comment - - edited

            Hello. I have this issue with MariaDB-server-10.4.13-1.el7.centos.x86_64.
            My setup is 5 vms running CentOS 7.2 with a galera cluster where all traffic is hitting 1 node.
            I've been running MariaDB-Galera-server-10.0.22-1.el6.x86_64 for many years without any problems, but decided that it would be good to upgrade, which I did in may, two months ago, performing a rolling upgrade like this:

            May 20 16:37:35 Updated: galera-25.3.29-1.rhel7.el7.centos.x86_64
            May 20 16:37:38 Erased: MariaDB-Galera-server-10.0.22-1.el6.x86_64
            May 20 16:37:43 Installed: jemalloc-3.6.0-1.el7.x86_64
            May 20 16:38:02 Installed: MariaDB-server-10.1.45-1.el7.centos.x86_64
            May 22 09:00:37 Installed: maxscale-2.4.9-1.x86_64
            May 22 09:23:51 Erased: MariaDB-server-10.1.45-1.el7.centos.x86_64
            May 22 09:24:17 Installed: MariaDB-server-10.2.32-1.el7.centos.x86_64
            May 22 10:21:15 Erased: MariaDB-server-10.2.32-1.el7.centos.x86_64
            May 22 10:21:37 Installed: MariaDB-server-10.3.23-1.el7.centos.x86_64
            May 25 13:06:27 Erased: maxscale-2.4.9-1.x86_64
            May 25 13:34:18 Erased: MariaDB-server-10.3.23-1.el7.centos.x86_64
            May 25 13:34:27 Installed: galera-4-26.4.4-1.rhel7.el7.centos.x86_64
            May 25 13:34:52 Installed: MariaDB-server-10.4.13-1.el7.centos.x86_64
            May 25 13:34:52 Erased: galera-25.3.29-1.rhel7.el7.centos.x86_64

            Same history except minor differences in the timings on all 5 nodes.

            I'm not using MaxScale (the application didn't like it, since it sometimes did reads immediately after write and thus found inconsistencies).

            What I've observed is that since May 25th/May 26th, all machines but one, are increasing their swap usage until there is none more available. And it is the nodes that don't get any query traffic, the ones that are just replicating the data that are "leaking".

            Attaching a graph showing the swap usage in percent before and after the upgrade where the change in behavior is obvious:

            And also supplying a 14-day view of the same metrics where it is obvious that the node that gets hit by queries (the green line) shows a different behavior:

            This is the "top" output on the node that is "the master/active":

            Tasks: 258 total,   4 running, 254 sleeping,   0 stopped,   0 zombie
            %Cpu(s): 33.3 us, 28.7 sy,  0.0 ni, 33.7 id,  0.8 wa,  0.0 hi,  3.5 si,  0.0 st
            KiB Mem :  5943044 total,   853060 free,  1777096 used,  3312888 buff/cache
            KiB Swap:  6242300 total,  5289576 free,   952724 used.  3598580 avail Mem
             
              PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
            13584 mysql     20   0 4965396 1.356g 172360 S  42.2 23.9  13734:56 mysqld
            

            And this is the same from one of the others:

            Tasks: 242 total,   4 running, 237 sleeping,   0 stopped,   1 zombie
            %Cpu(s): 11.2 us, 15.5 sy,  0.0 ni, 69.0 id,  0.9 wa,  0.0 hi,  3.4 si,  0.0 st
            KiB Mem :  5943044 total,  1235528 free,  3182500 used,  1525016 buff/cache
            KiB Swap:  6242300 total,  2882904 free,  3359396 used.  2219660 avail Mem
             
              PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
             3079 mysql     20   0 7652640 2.786g 138004 S  30.6 49.2   1129:47 mysqld
            
            

            Hope this helps!

            MattBianco Mattias Bergvall added a comment - - edited Hello. I have this issue with MariaDB-server-10.4.13-1.el7.centos.x86_64 . My setup is 5 vms running CentOS 7.2 with a galera cluster where all traffic is hitting 1 node . I've been running MariaDB-Galera-server-10.0.22-1.el6.x86_64 for many years without any problems, but decided that it would be good to upgrade, which I did in may, two months ago, performing a rolling upgrade like this: May 20 16:37:35 Updated: galera-25.3.29-1.rhel7.el7.centos.x86_64 May 20 16:37:38 Erased: MariaDB-Galera-server-10.0.22-1.el6.x86_64 May 20 16:37:43 Installed: jemalloc-3.6.0-1.el7.x86_64 May 20 16:38:02 Installed: MariaDB-server-10.1.45-1.el7.centos.x86_64 May 22 09:00:37 Installed: maxscale-2.4.9-1.x86_64 May 22 09:23:51 Erased: MariaDB-server-10.1.45-1.el7.centos.x86_64 May 22 09:24:17 Installed: MariaDB-server-10.2.32-1.el7.centos.x86_64 May 22 10:21:15 Erased: MariaDB-server-10.2.32-1.el7.centos.x86_64 May 22 10:21:37 Installed: MariaDB-server-10.3.23-1.el7.centos.x86_64 May 25 13:06:27 Erased: maxscale-2.4.9-1.x86_64 May 25 13:34:18 Erased: MariaDB-server-10.3.23-1.el7.centos.x86_64 May 25 13:34:27 Installed: galera-4-26.4.4-1.rhel7.el7.centos.x86_64 May 25 13:34:52 Installed: MariaDB-server-10.4.13-1.el7.centos.x86_64 May 25 13:34:52 Erased: galera-25.3.29-1.rhel7.el7.centos.x86_64 Same history except minor differences in the timings on all 5 nodes. I'm not using MaxScale (the application didn't like it, since it sometimes did reads immediately after write and thus found inconsistencies). What I've observed is that since May 25th/May 26th, all machines but one , are increasing their swap usage until there is none more available. And it is the nodes that don't get any query traffic, the ones that are just replicating the data that are "leaking". Attaching a graph showing the swap usage in percent before and after the upgrade where the change in behavior is obvious: And also supplying a 14-day view of the same metrics where it is obvious that the node that gets hit by queries (the green line) shows a different behavior: This is the "top" output on the node that is "the master/active": Tasks: 258 total, 4 running, 254 sleeping, 0 stopped, 0 zombie %Cpu(s): 33.3 us, 28.7 sy, 0.0 ni, 33.7 id, 0.8 wa, 0.0 hi, 3.5 si, 0.0 st KiB Mem : 5943044 total, 853060 free, 1777096 used, 3312888 buff/cache KiB Swap: 6242300 total, 5289576 free, 952724 used. 3598580 avail Mem   PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13584 mysql 20 0 4965396 1.356g 172360 S 42.2 23.9 13734:56 mysqld And this is the same from one of the others: Tasks: 242 total, 4 running, 237 sleeping, 0 stopped, 1 zombie %Cpu(s): 11.2 us, 15.5 sy, 0.0 ni, 69.0 id, 0.9 wa, 0.0 hi, 3.4 si, 0.0 st KiB Mem : 5943044 total, 1235528 free, 3182500 used, 1525016 buff/cache KiB Swap: 6242300 total, 2882904 free, 3359396 used. 2219660 avail Mem   PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3079 mysql 20 0 7652640 2.786g 138004 S 30.6 49.2 1129:47 mysqld Hope this helps!
            MattBianco Mattias Bergvall made changes -
            Attachment galera-4months-swap.png [ 52991 ]
            sanja Oleksandr Byelkin added a comment - - edited

            Soory, but it does not help much. I have no doubts that the problem exists (most above are try to prove existance of the problem), I need something to localize the problem.

            What engine you use, do you use replication, what kind of load is there and so on.

            Status variables before, durimg and an the end of the period may help...

            sanja Oleksandr Byelkin added a comment - - edited Soory, but it does not help much. I have no doubts that the problem exists (most above are try to prove existance of the problem), I need something to localize the problem. What engine you use, do you use replication, what kind of load is there and so on. Status variables before, durimg and an the end of the period may help...

            This is a 4-months grap of swap free in my 5-node cluster. The upgrade to 10.4 was made on may 25th-may26th. The spikes correspond to node reboots. The green line since most of june is the main node that has gotten all the traffic for the last 55+ days. All others have been rebooted regularly.

            MattBianco Mattias Bergvall added a comment - This is a 4-months grap of swap free in my 5-node cluster. The upgrade to 10.4 was made on may 25th-may26th. The spikes correspond to node reboots. The green line since most of june is the main node that has gotten all the traffic for the last 55+ days. All others have been rebooted regularly.
            MattBianco Mattias Bergvall added a comment - - edited

            The storage engine in use is InnoDB only.
            The load comes from an internal web application (99% SELECT) and a few other internal applications that perform perhaps 80% INSERT and 20% SELECT.

            Snapshot from the "main" node:

            MySQL uptime                            19 days, 02:47:43
            MySQL update operations per second      4.4 qps
            MySQL slow queries                      4374
            MySQL select operations per second      229.94 qps
            MySQL rollback operations per second    0.1322 qps
            MySQL queries per second                244.69 qps
            MySQL insert operations per second      1.5 qps
            MySQL delete operations per second      2.04 qps
            MySQL commit operations per second      0.68 qps
            MySQL bytes sent per second             160.98 KBps
            MySQL bytes received per second         47.27 KBps
            MySQL begin operations per second       0.8135 qps
            

            And from a "passive" node:

            MySQL uptime                            5 days, 21:22:38
            MySQL update operations per second      30.92 qps
            MySQL slow queries                      11
            MySQL select operations per second      0.7344 qps
            MySQL rollback operations per second    0 qps
            MySQL queries per second                4.73 qps
            MySQL insert operations per second      3.29 qps
            MySQL delete operations per second      3.63 qps
            MySQL commit operations per second      0.1328 qps
            MySQL bytes sent per second             468.78 Bps
            MySQL bytes received per second         496.72 Bps
            MySQL begin operations per second       0.1341 qps
            

            /etc/my.cnf.d/server (empty lines and comments removed. Names and stuff have been replaced by "-----"):

            [server]
            tmpdir=/var/tmp
            [mysqld]
            bind-address=0.0.0.0
            port=3306
            [galera]
            wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
            wsrep_cluster_address="gcomm://------.20,------.21,------.22,------.23,------.24"
            binlog_format=row
            default_storage_engine=InnoDB
            innodb_autoinc_lock_mode=2
            bind-address=0.0.0.0
            wsrep_on=ON
            wsrep_cluster_name='------'
            wsrep_node_address='---------------'
            wsrep_node_name='beaker'
            wsrep_sst_method=rsync
            wsrep_sst_auth=sst_user:-----------------------
            wsrep_provider_options="gcache.size=2G"
            wsrep_slave_threads=1
            innodb_flush_log_at_trx_commit=2
            [embedded]
            [mariadb]
            [mariadb-10.0]
            

            MattBianco Mattias Bergvall added a comment - - edited The storage engine in use is InnoDB only. The load comes from an internal web application (99% SELECT) and a few other internal applications that perform perhaps 80% INSERT and 20% SELECT. Snapshot from the "main" node: MySQL uptime 19 days, 02:47:43 MySQL update operations per second 4.4 qps MySQL slow queries 4374 MySQL select operations per second 229.94 qps MySQL rollback operations per second 0.1322 qps MySQL queries per second 244.69 qps MySQL insert operations per second 1.5 qps MySQL delete operations per second 2.04 qps MySQL commit operations per second 0.68 qps MySQL bytes sent per second 160.98 KBps MySQL bytes received per second 47.27 KBps MySQL begin operations per second 0.8135 qps And from a "passive" node: MySQL uptime 5 days, 21:22:38 MySQL update operations per second 30.92 qps MySQL slow queries 11 MySQL select operations per second 0.7344 qps MySQL rollback operations per second 0 qps MySQL queries per second 4.73 qps MySQL insert operations per second 3.29 qps MySQL delete operations per second 3.63 qps MySQL commit operations per second 0.1328 qps MySQL bytes sent per second 468.78 Bps MySQL bytes received per second 496.72 Bps MySQL begin operations per second 0.1341 qps /etc/my.cnf.d/server (empty lines and comments removed. Names and stuff have been replaced by "-----"): [server] tmpdir=/var/tmp [mysqld] bind-address=0.0.0.0 port=3306 [galera] wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so wsrep_cluster_address="gcomm://------.20,------.21,------.22,------.23,------.24" binlog_format=row default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 bind-address=0.0.0.0 wsrep_on=ON wsrep_cluster_name='------' wsrep_node_address='---------------' wsrep_node_name='beaker' wsrep_sst_method=rsync wsrep_sst_auth=sst_user:----------------------- wsrep_provider_options="gcache.size=2G" wsrep_slave_threads=1 innodb_flush_log_at_trx_commit=2 [embedded] [mariadb] [mariadb-10.0]
            lubor LuborJ added a comment - - edited

            I have similar setup with same problem. We have only 10G innodb_buffer_pool_size on 128GB RAM hardware and MariaDB use all of memory every few days.

            lubor LuborJ added a comment - - edited I have similar setup with same problem. We have only 10G innodb_buffer_pool_size on 128GB RAM hardware and MariaDB use all of memory every few days.

            This issue seems to have been resolved in MariaDB-server-10.4.14-1.el7.centos.x86_64 �

            MattBianco Mattias Bergvall added a comment - This issue seems to have been resolved in MariaDB-server- 10.4.14 -1.el7.centos.x86_64 �

            Thanks. I'll close it then. Given that we don't have enough information to reproduce the issue reliably, let's just assume it was fixed, as you no longer experience it.

            If anyone still has this, please do comment and we'll reopen the issue.

            serg Sergei Golubchik added a comment - Thanks. I'll close it then. Given that we don't have enough information to reproduce the issue reliably, let's just assume it was fixed, as you no longer experience it. If anyone still has this, please do comment and we'll reopen the issue.
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4.14 [ 24305 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            AB Andrew Bierbaum made changes -
            AB Andrew Bierbaum made changes -
            Attachment Screen Shot 2021-03-29 at 4.59.41 PM.png [ 56889 ]
            AB Andrew Bierbaum added a comment - - edited

            We are seeing this same issue on 10.4.15 after a recent upgrade from an older version.

            These are running on mariadb 10.4.15, bionic, installed via apt-get using puppet.

            We have two large hosts that are nearly identical in their setup. One runs a heavy automated load and the other is used for more ad-hoc queries with both set to innodb_buffer_pool_size=22G with innodb_buffer_pool_instances = 22. The one with light ad-hoc queries is steady at 24 GB of ram usage. The one with heavy automated load is currently using 49 gb of ram, and slowly growing, see attached memory usage

            Innodb does not seem to be aware of much of the ram usage.

            ----------------------
            BUFFER POOL AND MEMORY
            ----------------------
            Total large memory allocated 23655874560
            Dictionary memory allocated 5751648
            Buffer pool size   1412048
            Free buffers       26
            Database pages     1289404
            Old database pages 475523
            Modified db pages  2888
            Percent of dirty pages(LRU & free pages): 0.224
            Max dirty pages percent: 75.000
            Pending reads 0
            Pending writes: LRU 0, flush list 0, single page 0
            Pages made young 596448458, not young 252946942881
            314.17 youngs/s, 384.60 non-youngs/s
            Pages read 2518392152, created 22370165, written 632581348
            40.14 reads/s, 4.21 creates/s, 1191.78 writes/s
            Buffer pool hit rate 999 / 1000, young-making rate 0 / 1000 not 1 / 1000
            Pages read ahead 0.00/s, evicted without access 0.02/s, Random read ahead 0.00/s
            LRU len: 1289404, unzip_LRU len: 112198
            I/O sum[1357598]:cur[15136], unzip sum[35310]:cur[616]
            

            digging through variables and status has also been unfruitful in diagnosing the memory usage.

            mysql -e 'SHOW status like "%memory_used%"'
            +---------------------+-----------+
            | Variable_name       | Value     |
            +---------------------+-----------+
            | Memory_used         | 80032     |
            | Memory_used_initial | 158008888 |
            +---------------------+-----------+
            

            Also, running "FLUSH TABLES" doesn't appear to effect memory usage.

            We set the innodb_buffer_pool_size to a fairly low 22GB of the total 80 GB of memory on the host to try and slow and stabilize the issue. Memory usage may have started to level out, but is far beyond what was estimated using the innodb_buffer_pool_size=22GB. I can provide more details as it would be helpful, but I'm not sure what would help diagnose the issue.

            We are seeing very similar behavior in our smaller and more easily changed test environments as well. On that host, clear memory steps are taken when timed & automated query load hits every 6 hours, with oom reapering happening about every 2 days as it's currently configured. that host uses innodb_buffer_pool_size = 1456M with innodb_buffer_pool_instances = 2 on 8 GB of host memory.

            All these hosts are carrying out fairly heavy multi replication channel replication.

            Thanks

            AB Andrew Bierbaum added a comment - - edited We are seeing this same issue on 10.4.15 after a recent upgrade from an older version. These are running on mariadb 10.4.15, bionic, installed via apt-get using puppet. We have two large hosts that are nearly identical in their setup. One runs a heavy automated load and the other is used for more ad-hoc queries with both set to innodb_buffer_pool_size=22G with innodb_buffer_pool_instances = 22. The one with light ad-hoc queries is steady at 24 GB of ram usage. The one with heavy automated load is currently using 49 gb of ram, and slowly growing, see attached memory usage Innodb does not seem to be aware of much of the ram usage. ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 23655874560 Dictionary memory allocated 5751648 Buffer pool size 1412048 Free buffers 26 Database pages 1289404 Old database pages 475523 Modified db pages 2888 Percent of dirty pages(LRU & free pages): 0.224 Max dirty pages percent: 75.000 Pending reads 0 Pending writes: LRU 0 , flush list 0 , single page 0 Pages made young 596448458 , not young 252946942881 314.17 youngs/s, 384.60 non-youngs/s Pages read 2518392152 , created 22370165 , written 632581348 40.14 reads/s, 4.21 creates/s, 1191.78 writes/s Buffer pool hit rate 999 / 1000 , young-making rate 0 / 1000 not 1 / 1000 Pages read ahead 0.00 /s, evicted without access 0.02 /s, Random read ahead 0.00 /s LRU len: 1289404 , unzip_LRU len: 112198 I/O sum[ 1357598 ]:cur[ 15136 ], unzip sum[ 35310 ]:cur[ 616 ] digging through variables and status has also been unfruitful in diagnosing the memory usage. mysql -e 'SHOW status like "%memory_used%"' +---------------------+-----------+ | Variable_name | Value | +---------------------+-----------+ | Memory_used | 80032 | | Memory_used_initial | 158008888 | +---------------------+-----------+ Also, running "FLUSH TABLES" doesn't appear to effect memory usage. We set the innodb_buffer_pool_size to a fairly low 22GB of the total 80 GB of memory on the host to try and slow and stabilize the issue. Memory usage may have started to level out, but is far beyond what was estimated using the innodb_buffer_pool_size=22GB. I can provide more details as it would be helpful, but I'm not sure what would help diagnose the issue. We are seeing very similar behavior in our smaller and more easily changed test environments as well. On that host, clear memory steps are taken when timed & automated query load hits every 6 hours, with oom reapering happening about every 2 days as it's currently configured. that host uses innodb_buffer_pool_size = 1456M with innodb_buffer_pool_instances = 2 on 8 GB of host memory. All these hosts are carrying out fairly heavy multi replication channel replication. Thanks
            AB Andrew Bierbaum made changes -
            Attachment Screen Shot 2021-03-29 at 4.59.41 PM.png [ 56890 ]
            AB Andrew Bierbaum made changes -
            AB Andrew Bierbaum made changes -
            Attachment Screen Shot 2021-03-29 at 4.59.41 PM.png [ 56889 ]
            AB Andrew Bierbaum made changes -
            Attachment Screen Shot 2021-03-29 at 4.59.41 PM.png [ 56890 ]
            mgasco Maurice Gasco added a comment -

            I'm having the same issue on 10.4.18, CentOS 8
            I have 3 nodes with galera replication. The one that receives client connections has his memory growing permanently.
            with these parameters :
            max-connections = 3000
            innodb_log_file_size=1024M
            innodb_adaptive_hash_index=0
            innodb_buffer_pool_size=40G
            table_open_cache=10000

            LimitNOFILE=200000
            LimitMEMLOCK=200000

            With 1200 active connections, Memory grows to 140Gb in less than 24 hours

            mgasco Maurice Gasco added a comment - I'm having the same issue on 10.4.18, CentOS 8 I have 3 nodes with galera replication. The one that receives client connections has his memory growing permanently. with these parameters : max-connections = 3000 innodb_log_file_size=1024M innodb_adaptive_hash_index=0 innodb_buffer_pool_size=40G table_open_cache=10000 LimitNOFILE=200000 LimitMEMLOCK=200000 With 1200 active connections, Memory grows to 140Gb in less than 24 hours
            lubor LuborJ made changes -
            Attachment chart2.png [ 57518 ]
            lubor LuborJ made changes -
            Comment [ Same setup on three node cluster , MariaDB 10.5.9 and bug is still here.

            !chart2.png|thumbnail! ]
            AB Andrew Bierbaum added a comment - - edited

            I'm currently testing on lower environments, but it appears that switching memory allocators may solve the issues for us. I had found a few mentions of tcmalloc on other tickets on this jira.mariadb.org as well as a few other sites. ( see https://jira.mariadb.org/browse/MDEV-21372?focusedCommentId=144092&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-144092 and https://dba.stackexchange.com/a/273834 )

            sudo apt-get install google-perftools
            

            place in /etc/systemd/system/mariadb.service.d/override.conf

            [Service]
            Environment="LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4"
            

            then

            sudo systemctl daemon-reload
            sudo systemctl restart mysql
            

            produces

            MariaDB [(none)]> SHOW VARIABLES LIKE 'version_malloc_library';
            +------------------------+-------------------------+
            | Variable_name          | Value                   |
            +------------------------+-------------------------+
            | version_malloc_library | tcmalloc gperftools 2.5 |
            +------------------------+-------------------------+
            1 row in set (0.002 sec)
            

            The memory use appears stable on Ubuntu 18.04 with either mariadb 10.4.15 and 10.5.9.

            AB Andrew Bierbaum added a comment - - edited I'm currently testing on lower environments, but it appears that switching memory allocators may solve the issues for us. I had found a few mentions of tcmalloc on other tickets on this jira.mariadb.org as well as a few other sites. ( see https://jira.mariadb.org/browse/MDEV-21372?focusedCommentId=144092&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-144092 and https://dba.stackexchange.com/a/273834 ) sudo apt-get install google-perftools place in /etc/systemd/system/mariadb.service.d/override.conf [Service] Environment= "LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4" then sudo systemctl daemon-reload sudo systemctl restart mysql produces MariaDB [(none)]> SHOW VARIABLES LIKE 'version_malloc_library' ; +------------------------+-------------------------+ | Variable_name | Value | +------------------------+-------------------------+ | version_malloc_library | tcmalloc gperftools 2.5 | +------------------------+-------------------------+ 1 row in set ( 0.002 sec) The memory use appears stable on Ubuntu 18.04 with either mariadb 10.4.15 and 10.5.9.
            euglorg Eugene added a comment - - edited

            Hit very same issue. mariadb 10.4.15 with galera-26.4.5 on Gentoo, linux kernel 5.10.31.

            max_connections				= 2000
            table_open_cache			= 120000
            table_open_cache_instances              = 16
            innodb_adaptive_hash_index		= 0
            innodb_buffer_pool_size			= 64G
            innodb_log_file_size			= 8G
             
            max number of open files            10240000
            

            Used memory on mysqld grew from 160 to 240GB (within two weeks) with system malloc library.

            euglorg Eugene added a comment - - edited Hit very same issue. mariadb 10.4.15 with galera-26.4.5 on Gentoo, linux kernel 5.10.31. max_connections = 2000 table_open_cache = 120000 table_open_cache_instances = 16 innodb_adaptive_hash_index = 0 innodb_buffer_pool_size = 64G innodb_log_file_size = 8G   max number of open files 10240000 Used memory on mysqld grew from 160 to 240GB (within two weeks) with system malloc library.
            lksmith Lisa Smith (Inactive) made changes -
            Roel Roel Van de Paar made changes -
            Roel Roel Van de Paar made changes -
            serg Sergei Golubchik made changes -

            Can anyone seeing this issue confirm if they are NOT using replication?
            For replication based setups, see MDEV-26712.

            Roel Roel Van de Paar added a comment - Can anyone seeing this issue confirm if they are NOT using replication? For replication based setups, see MDEV-26712 .
            Roel Roel Van de Paar made changes -
            Roel Roel Van de Paar made changes -
            mmerz Matthias Merz added a comment -

            In my original report, I had also tried 10.4.6 without any replication, also observing memory consumption going up. At least then, this was not replication related. Also I see the mem growth only on the node(s) receiving SQL-requests, not on the replicated nodes without interactive sessions. So I'd assume the original bug to not be related to galera and at least (unfortunately) not MDEV-26712 in our case.

            OTOH it's practically impossible in our environment to reproduce this behaviour. It feels like it's gotten better (currently we are in 10.4.21), but never went away completely. And also another "single" instance on Debian buster (version 10.3.29) currently takes 196GB Resident-Set with innodb_buffer_pool_size=64G (and no MyISAM or other table types in a mentionable amount)

            mmerz Matthias Merz added a comment - In my original report, I had also tried 10.4.6 without any replication, also observing memory consumption going up. At least then, this was not replication related. Also I see the mem growth only on the node(s) receiving SQL-requests, not on the replicated nodes without interactive sessions. So I'd assume the original bug to not be related to galera and at least (unfortunately) not MDEV-26712 in our case. OTOH it's practically impossible in our environment to reproduce this behaviour. It feels like it's gotten better (currently we are in 10.4.21), but never went away completely. And also another "single" instance on Debian buster (version 10.3.29) currently takes 196GB Resident-Set with innodb_buffer_pool_size=64G (and no MyISAM or other table types in a mentionable amount)
            Roel Roel Van de Paar made changes -
            Component/s Galera [ 10124 ]
            Labels Memory_leak Memory_leak wsrep

            Thank you mmerz for the input, much appreciated.

            Roel Roel Van de Paar added a comment - Thank you mmerz for the input, much appreciated.
            xneelo shield added a comment -

            Adding some information on the topic of NOT using replication.

            Memory usage initially increase to what one expects - as assigned to the service. Then, over a time frame of 2 to 4 weeks, memory usage increases very slowly until completely depleted.

            Configuration:
            Around 215 servers used explicitly as database servers - MariaDB
            mariadb-server 1:10.4.21+maria~stretch
            Kernel: linux-image-4.19-amd64 (will be upgrading to Buster within the next few months).
            Mixture of innodb and myisam tables.
            The servers mostly host around 1800 web based databases, mostly Wordpress.

            Most of these servers have 32G RAM with a swappiness of 1, with settings:
            innodb_buffer_pool_size = 11159M
            interactive_timeout = 1800
            key_buffer_size = 3G
            max_connections = 400
            open_files_limit = 8000
            query_cache_limit = 1M
            query_cache_size = 0
            query_cache_type = OFF
            read_buffer_size = 2Mk
            sort_buffer_size = 2M
            table_open_cache = 2000
            thread_cache_size = 800
            thread_stack = 292K
            wait_timeout = 1800

            Calculating predicted RAM usage based on this query:
            SELECT ROUND( ( @@GLOBAL.key_buffer_size + @@GLOBAL.query_cache_size + @@GLOBAL.tmp_table_size + @@GLOBAL.innodb_buffer_pool_size + @@GLOBAL.aria_pagecache_buffer_size + @@GLOBAL.innodb_log_buffer_size + @@GLOBAL.max_connections * ( @@GLOBAL.sort_buffer_size + @@GLOBAL.read_buffer_size + @@GLOBAL.read_rnd_buffer_size + @@GLOBAL.join_buffer_size + @@GLOBAL.thread_stack + @@GLOBAL.binlog_cache_size) ) / 1024 / 1024, 1) `total MB`;

            ...a server with 32G is configured to use 16422.6MB.

            If you need more information, I'll gladly provide.

            xneelo shield added a comment - Adding some information on the topic of NOT using replication. Memory usage initially increase to what one expects - as assigned to the service. Then, over a time frame of 2 to 4 weeks, memory usage increases very slowly until completely depleted. Configuration: Around 215 servers used explicitly as database servers - MariaDB mariadb-server 1:10.4.21+maria~stretch Kernel: linux-image-4.19-amd64 (will be upgrading to Buster within the next few months). Mixture of innodb and myisam tables. The servers mostly host around 1800 web based databases, mostly Wordpress. Most of these servers have 32G RAM with a swappiness of 1, with settings: innodb_buffer_pool_size = 11159M interactive_timeout = 1800 key_buffer_size = 3G max_connections = 400 open_files_limit = 8000 query_cache_limit = 1M query_cache_size = 0 query_cache_type = OFF read_buffer_size = 2Mk sort_buffer_size = 2M table_open_cache = 2000 thread_cache_size = 800 thread_stack = 292K wait_timeout = 1800 Calculating predicted RAM usage based on this query: SELECT ROUND( ( @@GLOBAL.key_buffer_size + @@GLOBAL.query_cache_size + @@GLOBAL.tmp_table_size + @@GLOBAL.innodb_buffer_pool_size + @@GLOBAL.aria_pagecache_buffer_size + @@GLOBAL.innodb_log_buffer_size + @@GLOBAL.max_connections * ( @@GLOBAL.sort_buffer_size + @@GLOBAL.read_buffer_size + @@GLOBAL.read_rnd_buffer_size + @@GLOBAL.join_buffer_size + @@GLOBAL.thread_stack + @@GLOBAL.binlog_cache_size) ) / 1024 / 1024, 1) `total MB`; ...a server with 32G is configured to use 16422.6MB. If you need more information, I'll gladly provide.
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 99336 ] MariaDB v4 [ 156686 ]

            People

              sanja Oleksandr Byelkin
              mmerz Matthias Merz
              Votes:
              6 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.