Status: Closed (View Workflow)
Affects Version/s: 10.4.6, 10.4.7, 10.4.8
Fix Version/s: 10.4.14
Environment:Docker image from docker-hub, 10.4.6-bionic 3-node galera setup on a debian 10 host.
Only one instance is receiving live queries, two are passive nodes (or used readonly for backups)
256GB host memory
InnoDB tables only
MariaDB is configured for 64GB Innodb buffer_pool, which should lead to approx. 70-80GB of Memory consumption.
Over time, this increases, sometimes in larger steps, sometimes gradually. After 47h of "uptime" we are currently at:
this will in the end lead to an OOM condition in some days, but after OOM-kill, the galera IST will not work, triggering MDEV-20218 and in consequence break the whole cluster, because after some tries no donor is available and all nodes are DESYNC and writing replication logs to disk.
MDEV-16431 seems not ready yet. How can I debug this?
Please find the config file attached.
The memory consumtion is probably triggered by client access, because if we redirect our loadbalancer to the next backend, memory grows there. OTOH, memory usage won't decrease when not receiving queries, even after days. (Had to cut the experiment after 3 days, because node2/3 was threatening to break down)