[MDEV-29097] 10.8.3 seems to be using a lot more swap memory, always increasing (every time mariabackup runs daily) Created: 2022-07-13 Updated: 2023-01-20 Resolved: 2022-11-01 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.8.3 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Nuno | Assignee: | Daniel Black |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
I don't remember having this issue with 10.5, as although it was using a good amount of swap memory, I never had to do an emergency database restart because of it. Since a few days ago, I had been receiving alerts about too much swap being used on the server, and it's been getting worse. I had to restart MariaDB yesterday, because it was getting full. Before the restart, this was the usage: Resident RAM: Swap: And by the way, my server's sysctl has this config: vm.swappiness=1 My innodb_buffer_pool_size is 80G. Are you aware of any reason so much Swap would be used by MariaDB? There was plenty of resident free RAM that could be used instead. Since the restart yesterday, it's already using 2GB swap, and increasing. Thank you. |
| Comments |
| Comment by Marko Mäkelä [ 2022-07-14 ] | ||
|
Would the memory usage tracking of performance_schema produce anything useful? For a subset of your workload, can you try to get some more information via a heap profiler, such as the one of tcmalloc? | ||
| Comment by Nuno [ 2022-07-14 ] | ||
|
Hi marko. I have performance_schema disabled and looks like it's not a dynamic variable, so requires a restart. I can look into enabling it on the next restart, if the performance impact in having it enabled is negligible. As for using tcmalloc, that looks like quite advanced and something I'm not sure I'm comfortable enough to do without some guidance/tutorial specific for doing that with MariaDB. I can see what I get with performance_schema first, but I need to schedule a restart. | ||
| Comment by Nuno [ 2022-07-27 ] | ||
|
Hi marko I have performance_schema enabled, but where do you recommend me to look in there? Thank you very much. | ||
| Comment by Nuno [ 2022-07-27 ] | ||
|
marko | ||
| Comment by Marko Mäkelä [ 2022-08-03 ] | ||
|
nunop, I only know InnoDB, and it should not allocate too much outside its buffer pool. I can’t think of anything where the InnoDB heap memory usage should have increased between 10.5 and 10.8. I hope that serg, sanja or danblack can provide some advice how to find the largest users of memory. The increased resident set size (and thus swap memory) is not necessarily due to a memory leak; it could also be memory fragmentation. Using an alternative memory allocation library, such as tcmalloc or jemalloc, might reduce the fragmentation. | ||
| Comment by Nuno [ 2022-08-03 ] | ||
|
Thanks marko. For info, I only use InnoDB in my databases. I'm not 100% sure whether I started having this issue straight from the upgrade, or was it after I changed one of those other my.cnf configs we discussed in the other Issues. I can see what happens on 10.8.4, once I revert some of those configs. Will wait anyway for the feedback/suggestions from the others. Thanks everyone! | ||
| Comment by Marko Mäkelä [ 2022-08-03 ] | ||
|
nunop, in 10.6 there were some extensive changes to class Item, which implements all subexpression types in the SQL parser. The memory allocations or copying related to strings were supposed to be optimized. Without knowing specific details, that would be my main suspect. | ||
| Comment by Nuno [ 2022-08-10 ] | ||
|
Just noting here, I still see the swap increasing slowly, but somehow also slowly going back, which is making this more stable for a longer time (still slowly increasing over time anyway), not requiring me to restart MariaDB every 3-4 days. I think what helped was that I increased the log file from 5GB to 24GB (I read in the documentation that it's safe to have huge log files now), last time I restarted 10 days ago. (but still, this wasn't an issue in 10.5) Thanks marko for pointing some suspicions of what the cause could be. | ||
| Comment by Nuno [ 2022-08-13 ] | ||
|
I've been monitoring this. Swap has actually been pretty stable around 17GB, slowly increasing as I said, but not the end of the world. When MariaBackup happened today (takes 1~2 minutes as every day), it increased a bit of swap as usual, but it's strange how in the next 2 hours, it raised 3GB of swap, so now it's over 20GB swap (overall)... MariaDB is using 15.56 GB of swap Now, I don't know if this matters much, but when I started getting alerts about Swap > 20GB (at 8:25, 1 hour and a half after MariaBackup ran), MariaDB's InnoDB is currently at 79.68% of 95GB. It's still a mystery to me what caused MariaDB increase several swap GB during those 2 hours.. | ||
| Comment by Nuno [ 2022-09-19 ] | ||
|
2 days ago I had another huge swap jump (again, at the time MariaBackup runs) Seemed to be decreasing until I restarted MariaDB later in the day. Yesterday and today also increased a bit, but not much at all. | ||
| Comment by Marko Mäkelä [ 2022-09-22 ] | ||
|
nunop, if the problem is memory fragmentation in the allocator, using a different memory allocator might help. For GNU libc, you can find some environment variables documented in man mallopt. Alternative allocators such as tcmalloc or jemalloc might also provide better diagnostics. https://smalldatum.blogspot.com/2022/09/understanding-some-jemalloc-stats-for.html may be worth a read. I am afraid that without having more details, it will be difficult to fix anything. We do run tests with AddressSanitizer. There are a few open bug reports that mention LeakSanitizer. Could you check if you might be hitting one of them? | ||
| Comment by Nuno [ 2022-09-23 ] | ||
|
Hey! Thanks much for your tips there. I'll try to read them very soon. I just want to add a note that I believe I've just realized that the cause of the swap seems to be the "rsync" to the other server(s). This seems to match also the times I was manually rsyncing in the past days while testing the backups with 10.8.4. I just don't understand why would "rsync" cause MariaDB to swap, though... (the files I'm rsyncing are the backup ones, not the real files) Today was the first day that rsync happened twice, one to the HDD and one to the SSD. Took 1 hour, which is how much the swap increased in terms of time: Yesterday I did an rsync at this time too: Very strange... I'll continue to investigate based on this. | ||
| Comment by Nuno [ 2022-10-30 ] | ||
|
I think I've just figured out why this happens.... That's just how RHEL 8 / AlmaLinux 8 works. Relevant links with the actual explanation: And I confirm that even though my "vm.swappiness=1", most processes are still using the default value of "60", because they inherit that default value before "sysctl" tunes the system.
RHEL 8.7 (still in BETA) brings a new sysctl parameter "vm.force_cgroupv2_swappiness=1" which will resolve this issue. Until then, the workaround is to have some script that runs after boot to update all the "memory.swappiness" files. As a conclusion, I believe this Issue can be closed, as it's no longer relevant for MariaDB itself. | ||
| Comment by Marko Mäkelä [ 2022-10-31 ] | ||
|
nunop, it is great that you were able to figure it out. I leave it to danblack to decide if anything could be improved in our default configuration files or documentation. | ||
| Comment by Daniel Black [ 2022-11-01 ] | ||
|
I'm thinking if Red Hat are well on the way to providing a solution to document it would quickly be obsolete. Was the swap usage having a negative impact on QPS? Was swapin/swap out rates being rather high? Maybe manual limits can be applied in the interim - https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#MemorySwapMax=bytes | ||
| Comment by Nuno [ 2022-11-01 ] | ||
|
Hey danblack Thank you for your reply. Not sure if I understand what you mean with your first sentence, sorry! In relation to QPS (queries per second?), I don't think I noticed any performance impact, but it might be because the swap is on an NVMe disk. However, it's quite stressful/frightening to see the swap going close to 100% or sometimes even reaching 100%, when there is 35% ~ 40% RAM available still.... (on a server with 128GB RAM ..) The strange thing to me is that swap increases a lot while "rsync" is running, to transfer the backup to another server, while there's no evidence that while rsync runs, a lot of RAM is being used. Thank you for the tip about MemorySwapMax - I'll see if it can be useful in the meantime! Have a very great day. | ||
| Comment by Daniel Black [ 2022-11-01 ] | ||
|
First sentence, someday soon Red Hat seem to be going to deploy a solution. If I documented a work around it might not be compatible with the Red Hat one and would might leave users in a worse situtation. Thanks for clarifying QPS, that was the right question. I'd certainly hope that OOM wasn't the next outcome on reaching 100% swap with free RAM, but I appreciate the stress of it. I'd assume rsync is just using up page cache with all the files it read/wrote and somehow the MariaDB memory had a lower priority. I don't understand the logic in swappiness that lead to this. With MemorySwapMax, you can write to the memory.swap.max cgroup file for mariadb a value at runtime without a restart. Have a good day too. | ||
| Comment by Daniel Black [ 2022-11-01 ] | ||
|
Closing as Not a bug meaning not our bug. | ||
| Comment by Nuno [ 2022-11-01 ] | ||
|
Cheers! Yeah, based on this link (from one of my previous replies) - https://access.redhat.com/solutions/6785021 They say that the "right/best" thing to do is to start using "CgroupV2". But yeah, I agree that this is a bug/issue with RedHat, and not MariaDB, so I'm happy with you not having to document anything, as it is an OS issue, and quite specific. Thanks! | ||
| Comment by Marko Mäkelä [ 2022-11-02 ] | ||
Adding some calls to posix_fadvise() could help the Linux kernel to avoid polluting the file system cache with large files that are not going to be accessed any time soon. I encountered https://bugzilla.redhat.com/show_bug.cgi?id=841076 but did not check the current rsync source code. There might also be an option (some LD_PRELOAD library "shim" similar to libeatmydata.so) that would inject some posix_fadvise() calls at suitable places. Yet another option might be to patch rsync to use O_DIRECT, but that would require all file accesses and memory buffers to be aligned with the underlying physical block size (typically 512 or 4096 bytes). | ||
| Comment by Richard Stracke [ 2023-01-20 ] | ||
|
Another idea, AnonHugePages should work for applications without configuring. (transparent hugepages) transparent hugepages is enabled by default.
but this can sometimes not work
| ||
| Comment by Nuno [ 2023-01-20 ] | ||
|
Guys, Since I'm using vm.force_cgroup_v2_swappiness=1 (added in the latest version of RHEL8 / AlmaLinux 8), this is not longer an issue to me. It does eventually get to a lot of RAM used, but at least it takes months to get there, rather than once every 1-2 weeks! As I said anyway, with the sysctl option above, I'm no longer having this issue anymore, so I'm happy!! Thank you very much! | ||
| Comment by Marko Mäkelä [ 2023-01-20 ] | ||
|
nunop, this ticket has already been closed as "not a (MariaDB) bug". Thank you for your update. |