[MDEV-17247] Memory leak on parallel replicated slave Created: 2018-09-19 Updated: 2023-04-27 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.3.8 |
| Fix Version/s: | 10.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Luke Jolly | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Host - CoreOS |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
[ISSUE]: [BELLOW IS NO LONGER RELEVANT, LEFT FOR GOOGLE SEARCH HITS]: valgrind --tool=massif --massif-out-file=/var/log/mysql/massif.out.%p /usr/local/mysql/bin/mysqld However, when I shutdown MariaDB with a signal or with the SHUTDOWN command, valgrind errors and does not output a massif file. I have attached that error log as well. It seems that MariaDB maybe built wrong? I'm not sure what I'm doing wrong and am reaching the end of my debugging abilities. I attached a log called compile_install that the output from the build. I'm not sure what to look for in it though. Another night that might factor in is that I built the docker image on a host system running "Linux 4.18.7-arch1-1-ARCH" and am attempting to run it on a host running "Linux 4.14.67-coreos" |
| Comments |
| Comment by Luke Jolly [ 2018-09-19 ] |
|
I just tried running the image on the same system I built it on and got the same valgrind error on exit. |
| Comment by Luke Jolly [ 2018-09-19 ] |
|
I was finally able to get it write out the snapshots by running it with this command: valgrind --tool=massif --vgdb=yes --massif-out-file=/var/log/mysql/massif.out.%p --time-unit=ms --max-snapshots=200 /usr/local/mysql/bin/mysqld And then triggering using: vgdb all_snapshots [<filename>] I will try to get an snapshot of it happening and post it here. |
| Comment by Luke Jolly [ 2018-09-24 ] |
|
I uploaded a valgrind snapshot (snapshot_1537811421) hopefully showing the memory issue I'm experience in production. I restored a backup of our slave database less than an hour before a workload hit the master that caused memory to spike on the production slave. I started the restored slave and let it work through the back log of master events over the weekend. The compiled version running with valgrind is so much less performant that it took all weekend to work through about 40 minutes of binlogs. This is enough however to be well into the even. I will also upload a snapshot with the slave stopped showing that the memory usage stays about the same once I have that. snapshot_1537811421 I'm not sure what else to do to debug this. Any suggestion or input would be much appreciated. |
| Comment by Andrei Elkin [ 2018-09-25 ] |
|
Luke, hi. Thanks for trying to investigate it! You wrote I wonder if mere the slave service restart would also release it? That would put the blame Could please check it out? Andrei |
| Comment by Luke Jolly [ 2018-09-25 ] |
|
Stopping the slave frees up some memory but the doesn't fix what I perceive the leak is. So it will spike roughly 20-30 GB in memory, and stopping the slave will free up around 8 GB (those aren't exact numbers, but I saw this behavior many times). I ran stop slave yesterday on the vaglrind instance I'm running and it still hasn't stopped.... once it finally does I will attach the snapshots of it post stop. |
| Comment by Luke Jolly [ 2018-09-26 ] |
|
Well it seems you maybe correct, in this test instance it's looking like a lot more memory was cleaned up than I thought. Attached a post 'stop slave' memory snapshot. snapshot_1537963221 |
| Comment by Luke Jolly [ 2018-09-26 ] |
|
So this shows what I said yesterday that running 'stop slave; start slave;' does not fix the problem. When the memory drops a few GB on the graph is when the slave threads exited. Upon starting the slave again almost 20 GB of memory was consumed. |
| Comment by Luke Jolly [ 2018-10-01 ] |
|
Here's another snapshot having now I believed work through the entire period of binlogs that initially caused a memory spike on our production slave. snapshot_1538417420 Another thing that is of note is that htop says MariaDB is using 54.0GB of memory and if I'm reading the valgrind dump correctly it has only detailed 32.7GB of memory usage which seems quite a discrepancy. |
| Comment by Andrei Elkin [ 2018-10-02 ] |
|
To the lastest snapshot_1538417420, a graph of mem_heap_B as function of snapshot shows two steep curves. The first reaches about 350 GB at about 50 and drops |
| Comment by Luke Jolly [ 2018-10-02 ] |
|
The one drop in the memory is indeed from a STOP SLAVE. I can run STOP SLAVE again and see what happens. The thing that worries me is it seems like this valgrind'ed instance is not behaving exactly like the production slaves. I believe the most memory I have ever been able to "recover" in production with STOP SLAVE has been 8 GB. This test instance recovered x2 that. I have run STOP SLAVE on it, it will probably take at least until tomorrow for it to actually stop. |
| Comment by Luke Jolly [ 2018-10-02 ] |
|
What did you mean by text_mem_heap_B? |
| Comment by Andrei Elkin [ 2018-10-03 ] |
|
Sorry for confusing leftover from my editing. mem_heap_B was meant (which alone must be sufficient |
| Comment by Luke Jolly [ 2018-10-03 ] |
|
Well I don't think I'm going to be able to get anymore useful information out of my test valgrind instance. It crashed before the slave thread was able to exit. Here's the crash log crash_log |
| Comment by Andrei Elkin [ 2018-10-16 ] |
|
Thanks for the crash report! I will try to take a closer look to what is actually a deadlock in mutex acquisition. |
| Comment by Kristian Nielsen [ 2018-11-25 ] |
|
One way this could occur is if the slave gets blocked for some reason (eg. long-running ALTER TABLE), and @@slave_parallel_max_queued is set too high: |
| Comment by Luke Jolly [ 2018-11-28 ] |
|
slave_parallel_threads = 16 If I understand things correctly, this should only be able to lead to 1600 MB + some overhead of memory used. This is no where near the amounts of memory usage spikes we were seeing (20-30 GB). |
| Comment by Kristian Nielsen [ 2018-11-28 ] |
|
The limit of 1600 MB is in terms of event size as stored in binlog/relaylog. The actual memory usage is somewhat different (probably higher) when each event is converted to an in-memory C++ object. Agree that a factor of 10+ different sounds like a lot, and it's confusing that the slave_parallel_max_queued is not directly the actual memory usage. But still, it sounds like the most likely explanation is that the worker threads got behind and there was enough load to fill up the queues completely with 1.6 GByte of relaylog data (and even larger RAM usage). Queueing 100 MB of relay log events per thread sounds like hugely overkill unless there is a very special reason. 100 kB would be a more usual number, perhaps. |