[MDEV-13403] Mariadb (with TokuDB) excessive memory usage/leak Created: 2017-07-29 Updated: 2018-09-03 Resolved: 2018-06-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Packaging, Storage Engine - TokuDB |
| Affects Version/s: | 10.2.7 |
| Fix Version/s: | 10.2.16, 10.3.8 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Peter de Kraker | Assignee: | Sergei Golubchik |
| Resolution: | Fixed | Votes: | 17 |
| Labels: | tokudb | ||
| Environment: |
Ubuntu 16.04 x64 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
We have been running MariaDB 10.1 with TokuDB on a Ubuntu 14.04 VPS with 4GB ram. This always worked fine. We recently updated to 10.2 and suddenly MariaDB started eating all the memory there is and also uses a lot of swap. We did upgrade our VPS to a Ubuntu 16.04 8 GB instance (not because of the problems, but just because that would improve performance). Here the issues continued. Settings did not change between te VPS instances, we only allocated 4GB ram to TokuDB instead of 2GB. Under the same workload 10.2 eats up all RAM (using 7/8GB ram + 2/8GB Swap) after 2 days, while under 10.1 the ram usage stayed in line with what you would expect. Unfortunately we can't go back to 10.1, since importing our dataset takes a week. Our database consists mainly of TokuDB tables, with one table having 9 billion rows. Other tables are in the lower million rows. Total size inclusing indexes is 900GB (uncompressed) and 300GB without indexes. We do have a staging server that we can use to run valgrind massive on, and if necessary also on production, since the project is not very critical. However, we are still looking to reproduce the issue on the staging server. Also valgrind massive output does show a lot of '??' entries, even though we installed mariadb-server-core-dgbsym, mariadb-server-dbgsym and mariadb-plugins-tokudb-dbgsym. I will try to replicate the issue on the staging environment or otherwise use valgrind on production. However, I am not sure if massive option doesn't use much extra ram, making it hard to actually get to the ballooned ram issue. I attached the most relevant output from mysql and some graphs from grafana. |
| Comments |
| Comment by Peter de Kraker [ 2017-07-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After another day the mem usage has grown to: 7.3GB of mem and 3.8GB of swap. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Peter de Kraker [ 2017-08-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Upgraded to 10.2.8. We still seem to have major memory leaks. A few queries on TokuDB and our server stopped responding due to taking our 8GB swap. I don't understand why there has not been any reaction on this issue... Anyway, we have decided to move to Percona server since almost all of our data is in TokuDB. Hopefully this leads to a more stable setup. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Peter de Kraker [ 2017-08-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It took 3 days to reimport our data into Percona, and it's running stable now. No schema changes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Paradzik [ 2017-08-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am having the same issue with MariaDB 10.2.8 on Debian Stretch. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Paradzik [ 2017-08-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I made following discovery
while the mysqld process is using 60.9G VIRT / 48.4G RES. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Peter de Kraker [ 2017-08-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I ran your query on my Percona instance and think it does not correspond to RAM usage.
Percona/TokuDB is using 5GB of memory here, not 191. It seems to indicate total size of raw data files or so. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Paradzik [ 2017-08-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is my month old question on StackOverflow which contains more information (eg. SHOW STATUS .. ENGINE). I'm open to suggestions to help debug this issue. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael OBrien [ 2017-08-31 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Exact same issue, super high mysqld memory usage with InnoDB + TokuDB on MariaDB 10.2.8. Ubuntu 16.04.3 Host has 32GB memory + 24GB swap, mysqld is using about 30GB of that memory + 10+GB of swap after running for a few hours (starts very low of course). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Newt [ 2017-09-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Same issue on CentOS 7. MariaDB: 10.2.8 TokuDB tables: 3 (about 12GB of compressed data) 8GB of ram total on the server, 1GB allocated for InnoDB and 1GB for TokuDB. Total memory allocation for MariaDB was higher initially, but has been lowered because we keep running out of memory. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Reinis Rozitis [ 2017-10-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Have exactly the same problem on OpenSuse Leap 42.3 with Mariadb 10.2.9. Upgraded from 10.2.4 and suddenly all of ram is exhausted in minutes. Tried to test several versions (from 10.2.5 till 10.2.9) and the last not leaking memory is 10.2.4. p.s. it doesn't affect Slaves only Master. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeff [ 2017-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Just adding another vote for this. We didn't have the leak on an early version of 10.2, but have seen it in 10.2.{7,8,9). CentOS 7.4.1708 I used to run a nightly table analysis job, basically $(mysqlcheck -a), but that really chewed up ram which wasn't released afterwards. Please get in touch with me if there's any more information I can provide to debug this. This has been going on for months and if it's not resolved soon I'm sure we'll be forced to move off of mariadb. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Reinis Rozitis [ 2017-11-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
10.2.10 seems still to be affected. By looking at changelog and seing memory leak related fixes (like "fts_create_doc_id() unnecessarily allocates 8 bytes for every inserted row" ) while we don't use fulltext indexes, I was hoping that the some similar change could have been made into current release, but no - after updating from 10.2.4 to 10.2.10 Mariadb populates all the buffers and then goes on to happily leak afterwards until oom. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marc [ 2017-11-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Is it confirmed that the bug is only with using TokuDb? We have similar leaks problems with 10.2.8, on RH6, but we are not using TokuDB, only innodb. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Reinis Rozitis [ 2017-11-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> Is it confirmed that the bug is only with using TokuDb? We have similar leaks problems with 10.2.8, on RH6, but we are not using TokuDB, only innodb. It feels more like InnoDB leak or some specific case - I have an instance running 10.2.9 with only tokudb and also an instace with mixed toku and innodb tables and they work just fine, while another instance (with exact same hardware/OS/mysql config) but with different dataset and the leak manifests immediately (also only for versions 10.2.5+). Haven't got the chance to test InnnoDB-only instaces after 10.2.4 since kind of uneasy to upgrade production while there is this possibility of oom-kill-crash. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefano [ 2017-11-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Confirming issue as well in my case - MariaDB 10.2.6 on Gentoo Linux: Downgraded to 10.1.24 => problem does not occur anymore. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-11-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mariodb, can you please provide a test case, so that we can repeat the issue? These things could be highly dependent on the schema and the data. marc.langevin@usherbrooke.ca, Roze, there was a similar issue in InnoDB when using FULLTEXT INDEX: I am unaware of any actual memory leaks in InnoDB, except
There may be some unnecessary memory allocation (not freeing memory as soon as possible) and fragmentation in the memory allocator. I recently analyzed what happens in an UPDATE, and identified several places where InnoDB is unnecessarily allocating (and freeing) heap memory. Fixing that would involve extensive code changes, which are probably too excessive in a GA release. If you find something similarly excessive as | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefano [ 2017-11-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@Marko Mäkelä working on it. Edit 24.Nov.2017: sorry, haven't been able to replicate the issue so far on my test PC using simulated testcases and fake data (and btw., I Gentoo does not offer anymore v.10.2.6 in its package tree, so I am running the tests using MariaDB 10.2.10, therefore I am not sure if my tests are not good enough or if there was some fix in 10.2.10) => continuing to try to replicate the issue. Edit 25.Nov.2017 00:40: might have found a candidate testcase (RAM allocation kept on growing up to 3187MBs to then stabilize, which is still more than what I would have expected with settings of test-database, but which did not grow further within the next 20 minutes) => currently recreating base table but with a bit more entropy => will then rerun testcase and see if the data itself and/or duration of test has any impact on RAM allocation. Edit 26.Nov.2017 I did copy 1:1 the settings that I use with MariaDB 10.1 but at least in Gentoo the config files have changed (starting from MariaDB 10.2 they're split between "/etc/mysql/mariadb.d/50-distro-client.cnf" and "/etc/mysql/mariadb.d/50-distro-server.cnf") so it could still be that I'm making some kind of mistake there? General informations
PC used for testing:
Summary
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steven McDowall [ 2017-11-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We have the same issue and filed an issue with MariaDB direct – working with various people there mostly @kurtpastore and Paul Moen.. we downgraded to 10.1 for now. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefano [ 2017-11-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Test running on:
Built MariaDB with following USEflags: Looking at RAM usage by using utility "htop" at column "RES" of "mysqld" process and overall "Mem" line of host, and logging "RSS" of "mysqld" from "ps" using the following: echo -n "$(date): " > ps-res-mysql.txt && ps aux | head -n1 >> ps-res-mysql.txt Test simplified to perform only "select"-SQLs (no update/insert/delete nor DDLs) against a big prepopulated table.
TEST PREPARATION create table mytbl3 insert into mytbl3 commit Shutdown & restart DB. TEST SCRIPT
Database settings
50-distro-client.cnf:
I stopped the test when the RES/RSS size of mariadb reached ~7753MBs (having 4GBs for "tokudb_cache_size" my expectation is to have a RES/RSS between 4 and 5GBs - e.g. currently my other PC that runs MariaDB 10.1.24-r1 has ~4.8GBs RES/RSS for MariaDB after a week of looping through a similar SQL). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chris Savery [ 2017-12-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'm having the same or similar issue here. I was using TokuDB with MariaDB 10.2.11 and had run away memory consumption and OOM shutdowns. This is on a Ubuntu 16.04 VPS 4vCPU,8GB with one database with high level of insertions (initial loading), but only around 1-2 GB of total data. I had to give up on TokuDB for this and other poor performance reasons. So I disabled TokuDB, and enabled the RocksDB plugin and started testing my application with this engine. However, I am getting similar issues with RocksDB, though admittedly not as severe. I can do ongoing insertions for about 6-8 hours before memory climbs to >6GB (total 8GB, cnf set at 4GB). In both plugin cases I set only one my.cnf setting - tokudb_cache_size or rocksdb_cache_size both for 4G and then in later trials for 3200M. Now with RocksDB it will grow steadily til about 4-5GB and then slows down and crawls up to more than 6GB over several hours. At some point when it exceeds this level it will always start slowing and overloading the system so that nothing can work. Insertion rate drops to near zero. Even typing commands in the terminal or another ssh login will take several minutes. htop shows massive cpu wait cycles (gray bars). At this point if I'm lucky I can stop the mysql daemon but often only a hard reboot from the host company control panel will resurrect the system. Once rebooted, or MariaDB restarted, it will operate fine, and so far has not lost data (I'm aware of), and starts again at 120MB and moves upwards and repeats the cycle. It really is unusable as is without some kind of way to limit MariaDB memory use. In the case of RocksDB I don't think there is an option to downgrade as I think it's only supported in 10.2. For the reasons above I expect this is either another unrelated memory leak or it is a leak in MariaDB and not TokuDB plugin specific. I cannot imagine that it is expected behavior for MariaDB to consume memory until it crashes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by leo cardia [ 2017-12-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I also have same issue here. Serve1r: 250 Gb memory result is [ERROR] mysqld: Out of memory (Needed 262143960 bytes) TCS(tokudb_cache_size) = 40G = drains 250Gb memory in 6 Hours. Tokudb cache drains lots of memory. Several configuration, I have slave mariadb server with 128GB configuration. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marc [ 2018-01-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I finally discovered that my memory problem was caused by innodb not using large pages and there was no memory leak in my case, we are not using TokuDb neither. Problem was happening because in huge pages configuration I was including the one that were supposed to be used by Innodb but since there were not , but still were reserved, when innodb needed memory it was allocated from remaining memory of the system so appearing as leaking ... I reconfigured huge pages minus pages needed by innodb and memory usage went down immediately. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chris Savery [ 2018-02-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've just been bitten by this again. I was testing a table using RocksDB engine and 32GB RAM on a server. I hadn't noticed that memory was climbing up and the kernel went OOM and locked up everything. Only recourse was a hard reboot and RocksDB table was corrupted and non-recoverable afterwards. I know this issue is filed under TokuDB but given comments above and my own experiences with this initially on TokuDB and switching to RocksDB and having the same problems I'm pretty sure this issue is not specific to TokuDB and a more general MariabDB - perhaps engine plugin related. I ran for 2 weeks on this server with only MyISAM tables (InnoDB enabled but no use/activity). With 5 hours of altering a table to use RocksDB engine it went OOM and crashed. The only sql being run was an insert ...select from a MyISAM table as a test of inserting 100 million records. Nothing more complex or demanding. This was on MariabDB 10.2.12 on an Ubuntu 16.04 i6700 server with 32GB 480GB SSD (RAID1). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Steven McDowall [ 2018-02-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We have hit this too – many times and consistently – when inserting large amounts (> 10TB) of data into TokuDB via LOAD DATA INFILES .. This only happened in 10.2X and never in 10.0 or 10.1. Sounds like large data insertion maybe the underlying culprit and not TokuDB per se? I'm surprised at how long this rather large problem has been still around without much (any?) word from MariaDB .. especially if it's a cross engine issue and not TokuDB itself .. (i.e. it's not Percona) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Reinis Rozitis [ 2018-02-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> This only happened in 10.2X and never in 10.0 or 10.1. It doesn't happen on 10.2 also till 10.2.4 .. after that there are some changes regarding how the storage engines plug in (at least for example from packaging point of view tokudb gets external package). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Richard Stracke [ 2018-04-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
also happpened with massive insert into table(a,b,c)values(1,2,3),(2,4,5),.....(1,2,3) into a tokudb table. Confirmed, happen with 10.2 but not 10.1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Yudelevich [ 2018-06-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can also confirm the same issue in 10.3.7 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Yudelevich [ 2018-06-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It seems like setting the malloc-lib explicitly to jemalloc resolves this issue (at least for my case in 10.3.7)
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefano [ 2018-07-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you guys confirm that the bug has been fixed? Side story - just fyi I am currently exporting all data to then [reimport it under 10.2 + run workload] and then if 10.2 is still bad [reimport it into 10.1 + run | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2018-07-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I cannot truly confirm it, as I wasn't able to repeat the issue. But all previous comments seem to imply that it didn't happen in 10.1 (where MariaDB is linked with jemalloc) and only happen in 10.2+ (where only TokuDB is linked with jemalloc) and this issue disappears when jemalloc is ld-preloaded. So, this means that this behavior happen when a shared library linked with jemalloc is dynamically loaded (dlopen) into a non-jemalloc binary. Jemalloc apparently wants to be the only memory allocator for the whole executable and doesn't like to share. So, we fixed this issue by not linking tokudb with jemalloc and instead ld-preloading jemalloc. Which resolves the issue according to earlier comments. This automatic ld-preloading only works in our deb and rpm packages (where tokudb package installs a .cnf file with the malloc-lib=jemalloc line). As you build yourself on Gentoo, you need to make sure that either the mysqld executable is linked with jemalloc or nothing is linked with jemalloc and you ld-preload it. As far as the performance goes, could you please report it as a separate bug, so that we could investigate it? Thanks. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefano [ 2018-07-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Sergei I'm currently downgrading back to 10.1(.33) mainly because I noticed that I had non-optimal insert-performance as well with the old 10.2.15 (when I wrote the above post I did not realize that I did not have to export+import again everything to do a minor downgrade) and as well with 10.3.8 (the major upgrade was my final hope...) => I therefore suppose that some change done in 10.2/3 is causing this. Yes, I will try to simplify and reproduce the "good vs bad" performance (in VMs) and then post a bug report. I will shut up about the memory leak for the time being as long as nobody else complains and I don't have to upgrade to 10.2/3 Cheers and thanks a lot for your help. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Reinis Rozitis [ 2018-07-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can confirm that with 10.2.16 the leak/excessive memory usage is gone. Thank you for the fix. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marios Hadjieleftheriou [ 2018-07-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are running MariaDB 10.3.7 with tokudb 5.6.39-83.1 on Debian 9 and we are experiencing exactly the same problem. It seems that all queries drive memory up, which is never released, but some big selects on very large tables (400 million rows) like this: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefano [ 2018-09-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi all It's definitely not as obviously visible as with versions equal/lower than 10.2.15, but in my opinion the problem is still there. Attaching the file "stefano-mariadb-10_02_16-ram-20180825.png": My current setting is"tokudb_cache_size = 12G", so I wouldn't expect it to go a lot beyond that - 15GB is definitely beyond my expectations. Sometimes it went up to 26 GBs, but I've always been able to shut it down before it started doing some serious swapping. I admit that the graph isn't very exciting, but I've seen numbers grow a lot faster under certain conditions (e.g. concurrent queries like "insert into [my_isam_table] select a_single_column from [my_tokudb_tbl]>" + "insert into [my_VEC_table] select some, other, cols from [my_tokudb_tbl]" made my 32GB server start swapping within minutes - sorry, I did not repeat the experiment. MariaDB 10.1 (used mostly 10.1.34) did not have this problem but totally hanged after ~1 days of continuous "insert"s with no kind of hint of "why" anywhere => I usually had to "kill -9 xxx" the mariadb-process & start it & wait for a recovery", so from this point of view the 10.2-series is better (no hard hangs even after having processed the same amount of data as the 10.1-series) but for me it's still ultimately unstable. (btw. when using TokuDB the insert-rate with the 10.1-series is still 10-20% faster than with 10.2/3-series; looks like 10.2/3 uses max 25-50-75% of the available vCPUs for the checkpoints, but this is another issue...) Question: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Reinis Rozitis [ 2018-09-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> My current setting is"tokudb_cache_size = 12G", so I wouldn't expect it to go a lot beyond that - 15GB is definitely beyond my expectations. For 12G only tokudb cache 15Gb memory usage for the whole mysqld process seems fairly normal - there are bunch of other things mysqls uses/wastes memory on (like connection / sort buffers / other engines etc) and the 10.2.x/10.3.x series use a bit more ram anyways (might be just the changes in upstream). For testing purposes I would suggest setting the cache size to something smaller like 1G and then checking if the server starts to swap or at what point the used memory stabilises (before this patch (the way Mariadb was compiled/packed) the cache size didn't even matter and you hit OOM sooner or later anyways). p.s. It's also worth checking what Jemalloc version you are running - for example I found out that 5.0 (coming with OpenSuse Leap 15) makes the mysqld crash on shutdown which then makes the sytemd to restart the process into loop. With 5.1 ( https://github.com/jemalloc/jemalloc/releases/tag/5.1.0 ) haven't noticed any problems so far. |