Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.5.13, 10.6.7
-
None
-
Debian Bullseye, MariaDB through the Maria repositories
Description
I have a backup server which is used for making snapshots of a MariaDB database. It slaves off of a MariaDB master. All servers in the system were running 10.3 until recently, when I upgraded the backup to 10.5.
At that time the backup server started to regularly run out of memory. MariaDB gets OOMed by the kernel and I have to restart it quite often.
mysqltuner reports that the max memory usage by MariaDB should be some ~8GB. I have 16GB RAM on the machine. MariaDB will hit 95%+ memory usage according to top, and eventually gets killed.
MariaDB on this server doesn't do anything other than replication, so it's hard to say for sure whether this is a replication problem or a "doing anything" problem. But it certainly scales with the amount of replication being performed: if it's catching up on a replication backlog, then it'll fly through memory.
It may be related to replication paralellism (slave_domain_parallel_threads). On 10.3 I was running 8 threads in "aggressive" mode without trouble, and that's what I started with on 10.5 and 10.6. Yesterday I experimented with the settings:
slave_parallel_threads | slave_parallel_mode | slave_retried_transaction / minute | MB RAM leaked / minute |
---|---|---|---|
2 | aggressive | 106 | 16 |
8 | none | 0 | 0 |
8 | minimal | 0 | 22 |
8 | aggressive | 221 | 11 |
20 | minimal | 0 | 19 |
20 | aggressive | 420 | 30 |
These were run over the course of ~10 minutes each while replication was catching up. So it isn't a huge sample, and the test may not have been "fair" in terms of exactly what commands were being executed for each test.
The biggest thing to note is the 0 MB/min leaked in "none" mode. Indeed, when I have slave parallelization disabled completely (via slave_parallel_mode=none or slave_domain_parallel_threads=0) it /seems/ that the memory is no longer leaking, or is doing so much more slowly.
(This may or may not be the same as MDEV-27481. That one affects 10.3, which I did not seem to have a problem with.)