[MDEV-28436] Memory leak on slave server Created: 2022-04-28 Updated: 2022-10-30 Resolved: 2022-10-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.5.13, 10.6.7 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Xan Charbonnet | Assignee: | Angelique Sklavounos (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Debian Bullseye, MariaDB through the Maria repositories |
||
| Description |
|
I have a backup server which is used for making snapshots of a MariaDB database. It slaves off of a MariaDB master. All servers in the system were running 10.3 until recently, when I upgraded the backup to 10.5. At that time the backup server started to regularly run out of memory. MariaDB gets OOMed by the kernel and I have to restart it quite often. mysqltuner reports that the max memory usage by MariaDB should be some ~8GB. I have 16GB RAM on the machine. MariaDB will hit 95%+ memory usage according to top, and eventually gets killed. MariaDB on this server doesn't do anything other than replication, so it's hard to say for sure whether this is a replication problem or a "doing anything" problem. But it certainly scales with the amount of replication being performed: if it's catching up on a replication backlog, then it'll fly through memory. It may be related to replication paralellism (slave_domain_parallel_threads). On 10.3 I was running 8 threads in "aggressive" mode without trouble, and that's what I started with on 10.5 and 10.6. Yesterday I experimented with the settings:
These were run over the course of ~10 minutes each while replication was catching up. So it isn't a huge sample, and the test may not have been "fair" in terms of exactly what commands were being executed for each test. The biggest thing to note is the 0 MB/min leaked in "none" mode. Indeed, when I have slave parallelization disabled completely (via slave_parallel_mode=none or slave_domain_parallel_threads=0) it /seems/ that the memory is no longer leaking, or is doing so much more slowly. (This may or may not be the same as |
| Comments |
| Comment by Xan Charbonnet [ 2022-06-14 ] |
|
After much further testing, I believe this is a problem with the system's memory allocator. Switching to jemalloc for Maria seems to have solved it. |
| Comment by Xan Charbonnet [ 2022-07-12 ] |
|
Another update: I believe the different memory allocator helped, and in general MariaDB isn't running the system out of memory. However, when doing a lot of catchup all at once from replication, it is still running out. |
| Comment by Andrei Elkin [ 2022-07-27 ] |
|
xan@biblionix.com, the best if you could share with us your slave's initial data and binlog to replay to prove the effects you observed. Can be it done? |
| Comment by Xan Charbonnet [ 2022-07-28 ] |
|
I'm afraid I can't share the actual data, but I'll try to see if I can come up with a constructed example. |