[MDEV-20699] mysqldump of routines causes MariaDB to get killed by oom-killer Created: 2019-09-30 Updated: 2021-10-04 Resolved: 2021-09-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.1.25, 10.3.17, 10.4.8 |
| Fix Version/s: | 10.3.32, 10.4.22, 10.5.13, 10.6.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Matthias | Assignee: | Vladislav Vaintroub |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | crash, mysqldump, stored_procedures | ||
| Environment: |
|
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Hi guys, we have a problem creating a full dump of our database using mysqldump. Since we upgraded from MariaDB 10.1.25 to MariaDB 10.3.17 two weeks ago, creating a full dump including all schemas (total of 3), triggers, events and routines causes the server get killed by oom-killer. TL;DR: Details: Before the upgrade, everything worked fine on the slave and the (older) test system. Is it possible that some internal structure or table of MariaDB got somehow corrupted? And if so, how can I fix this? I also wanted to know what's going on inside MariaDB on our test system and so I started it using valgrind. I viewed the output file using massif-visualizer and it looks to me like MariaDB allocates huge amounts of memory for every stored procedure and never frees it again. But I'm completely new to valgrind, so I have no idea how to interpret the results correctly. If it helps, I can attach the massif's output file here (447 KB). Here's the mysqldump command I'm using: And even with this command it crashes: Any help or idea is welcome! Regards, |
| Comments |
| Comment by Elena Stepanova [ 2019-10-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
First of all, could you please specify how much memory your machine has, and how much (approximately) is "vast amounts of memory"? Further, I'm not sure I quite understand, does it or does it not work on 10.1? First you said So which is it, does it work fine or does it crash? And yes, sure, please attach your massif output. Please also paste or attach your config file(s). If you can provide the actual dump on which the problem is reproducible, it would be even better. You can upload it to ftp.askmonty.org/private if you don't want to make it public. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias [ 2019-10-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi @Elena, thank you for your answer! All of our servers had 2 GB of memory in total. I know, this sounds few, but it worked for more than 9 months. Now, our test server still has 2 GB, while the other (productive) servers have 4 to 10 GB. I'm not sure if I interpret the results from valgrind and massif correctly, but it looks like MariaDB allocates about 80 MB per stored procedure (and never releases the memory again). As we have 422 stored procedures, that would make about 33 GB in total. I'm sorry, I must have expressed myself somewhat inaccurately. I'm afraid I can't provide the actual dump because it has 3 GB in size, but maybe I can provide one single stored procedure, so you can test it and see if it's the stored procedure that causes this issue. But actually it looks like that all stored procedures are affected, not only one single or maybe a couple... I will provide the massif output and the config file. By the way: I'm on vacation next week, so please don't be mad at me if I don't answer. But I'll be back on October 21st. Thank you and regards, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oleksandr Byelkin [ 2020-03-25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
What about test suite? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-09-25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can't reproduce with current 10.3 (I did not test any earlier versions, since I understand, it is not used be the reporter).
However, in terms of virtual memory, pmap shows about 500MB more usage, which does not grow if dump is repeated
That thing is the supposedly libc heap growth, and whether it is possible to ever get this memory back, and whether it is possible to force it to allocate less, or maybe there is a more compact allocator, I do not really know. So, matze, is there still a problem we need to investigate? maybe you can try some recent version of mariadb, and be happy with results. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-09-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Alright, of course the routines to be dumped are parsed for SHOW CREATE PROCEDURE, and stored in a per-connection cache, which is a problem for mysqldump, if there is a lot of routines, and they are large. For the 80 stored procedures, found in Memory measurement.zip The raw data is in umdh.zip Note, that some of those procedures do not work in 10.4 (server would complain about undeclared in_User variable). I think that a workaround, for matze would be to SET GLOBAL stored_program_cache=1, at least for duration of mysqldump. And a fix for this case, would be to bypass caching stored procedures in SHOW CREATE - the very reason for cache is to speedup SP execution, so caching during the execution only should be fine. And a general remark, since this per-connection cache can grow almost uncontrollably, and SP can easily take several MB, maybe the decision on whether to cache, should not be based on number of stored procedures, but on size, in bytes, of parsed SPs. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2021-09-27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sanja, please review bb-10.3-wlad- | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oleksandr Byelkin [ 2021-09-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
OK to push |