We have been running MariaDB 10.1 with TokuDB on a Ubuntu 14.04 VPS with 4GB ram. This always worked fine. We recently updated to 10.2 and suddenly MariaDB started eating all the memory there is and also uses a lot of swap. We did upgrade our VPS to a Ubuntu 16.04 8 GB instance (not because of the problems, but just because that would improve performance). Here the issues continued.
Settings did not change between te VPS instances, we only allocated 4GB ram to TokuDB instead of 2GB.
Under the same workload 10.2 eats up all RAM (using 7/8GB ram + 2/8GB Swap) after 2 days, while under 10.1 the ram usage stayed in line with what you would expect.
Unfortunately we can't go back to 10.1, since importing our dataset takes a week.
Our database consists mainly of TokuDB tables, with one table having 9 billion rows. Other tables are in the lower million rows. Total size inclusing indexes is 900GB (uncompressed) and 300GB without indexes.
Workload is lots of reads and inserts, but no deletes.
Strangely the memory balloons most when running the daily stats gathering, which is almost a pure read query, except for some stats entries that get inserted.
We do have a staging server that we can use to run valgrind massive on, and if necessary also on production, since the project is not very critical. However, we are still looking to reproduce the issue on the staging server. Also valgrind massive output does show a lot of '??' entries, even though we installed mariadb-server-core-dgbsym, mariadb-server-dbgsym and mariadb-plugins-tokudb-dbgsym.
I will try to replicate the issue on the staging environment or otherwise use valgrind on production. However, I am not sure if massive option doesn't use much extra ram, making it hard to actually get to the ballooned ram issue.
I attached the most relevant output from mysql and some graphs from grafana.
Let me know if you need more.
{"report":{"fcp":1367,"ttfb":436.69999980926514,"pageVisibility":"visible","entityId":62431,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"64bbd1e4-7e6a-405b-be8a-54be6cff6079","navigationType":0,"readyForUser":1472.6000003814697,"redirectCount":0,"resourceLoadedEnd":1050.1999998092651,"resourceLoadedStart":445.69999980926514,"resourceTiming":[{"duration":179.9000005722046,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":445.69999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":445.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":625.6000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":180.39999961853027,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":446.1000003814697,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":446.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":626.5,"responseStart":0,"secureConnectionStart":0},{"duration":470.80000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":446.30000019073486,"connectEnd":446.30000019073486,"connectStart":446.30000019073486,"domainLookupEnd":446.30000019073486,"domainLookupStart":446.30000019073486,"fetchStart":446.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":639.6999998092651,"responseEnd":917.1000003814697,"responseStart":658.1000003814697,"secureConnectionStart":446.30000019073486},{"duration":603.5999994277954,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":446.6000003814697,"connectEnd":446.6000003814697,"connectStart":446.6000003814697,"domainLookupEnd":446.6000003814697,"domainLookupStart":446.6000003814697,"fetchStart":446.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":639.5,"responseEnd":1050.1999998092651,"responseStart":656.6000003814697,"secureConnectionStart":446.6000003814697},{"duration":216.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":446.80000019073486,"connectEnd":446.80000019073486,"connectStart":446.80000019073486,"domainLookupEnd":446.80000019073486,"domainLookupStart":446.80000019073486,"fetchStart":446.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":640.6000003814697,"responseEnd":663.5,"responseStart":660.8000001907349,"secureConnectionStart":446.80000019073486},{"duration":219,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":446.9000005722046,"connectEnd":446.9000005722046,"connectStart":446.9000005722046,"domainLookupEnd":446.9000005722046,"domainLookupStart":446.9000005722046,"fetchStart":446.9000005722046,"redirectEnd":0,"redirectStart":0,"requestStart":642,"responseEnd":665.9000005722046,"responseStart":664.6000003814697,"secureConnectionStart":446.9000005722046},{"duration":218.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":447.19999980926514,"connectEnd":447.19999980926514,"connectStart":447.19999980926514,"domainLookupEnd":447.19999980926514,"domainLookupStart":447.19999980926514,"fetchStart":447.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":641.8000001907349,"responseEnd":665.6999998092651,"responseStart":664.1000003814697,"secureConnectionStart":447.19999980926514},{"duration":189.30000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":447.30000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":447.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":636.6000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":224.5,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":447.4000005722046,"connectEnd":447.4000005722046,"connectStart":447.4000005722046,"domainLookupEnd":447.4000005722046,"domainLookupStart":447.4000005722046,"fetchStart":447.4000005722046,"redirectEnd":0,"redirectStart":0,"requestStart":644.8000001907349,"responseEnd":671.9000005722046,"responseStart":670.1000003814697,"secureConnectionStart":447.4000005722046},{"duration":195.30000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":447.80000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":447.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":643.1000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":226.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":447.9000005722046,"connectEnd":447.9000005722046,"connectStart":447.9000005722046,"domainLookupEnd":447.9000005722046,"domainLookupStart":447.9000005722046,"fetchStart":447.9000005722046,"redirectEnd":0,"redirectStart":0,"requestStart":645.1999998092651,"responseEnd":674.1000003814697,"responseStart":672.1999998092651,"secureConnectionStart":447.9000005722046},{"duration":560.5999994277954,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":457.1000003814697,"connectEnd":457.1000003814697,"connectStart":457.1000003814697,"domainLookupEnd":457.1000003814697,"domainLookupStart":457.1000003814697,"fetchStart":457.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":835.5,"responseEnd":1017.6999998092651,"responseStart":1014.3000001907349,"secureConnectionStart":457.1000003814697},{"duration":570.1999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":457.30000019073486,"connectEnd":457.30000019073486,"connectStart":457.30000019073486,"domainLookupEnd":457.30000019073486,"domainLookupStart":457.30000019073486,"fetchStart":457.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":999.4000005722046,"responseEnd":1027.5,"responseStart":1024.3000001907349,"secureConnectionStart":457.30000019073486},{"duration":226.30000019073486,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1055.1999998092651,"connectEnd":1055.1999998092651,"connectStart":1055.1999998092651,"domainLookupEnd":1055.1999998092651,"domainLookupStart":1055.1999998092651,"fetchStart":1055.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":1242.1000003814697,"responseEnd":1281.5,"responseStart":1280.6999998092651,"secureConnectionStart":1055.1999998092651}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":218,"responseStart":437,"responseEnd":454,"domLoading":441,"domInteractive":1589,"domContentLoadedEventStart":1589,"domContentLoadedEventEnd":1668,"domComplete":3197,"loadEventStart":3197,"loadEventEnd":3197,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1555.3000001907349},{"name":"bigPipe.sidebar-id.end","time":1556.1999998092651},{"name":"bigPipe.activity-panel-pipe-id.start","time":1556.4000005722046},{"name":"bigPipe.activity-panel-pipe-id.end","time":1558.4000005722046},{"name":"activityTabFullyLoaded","time":1725.6999998092651}],"measures":[],"correlationId":"da335ff21aade3","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":153,"dbReadsTimeInMs":16,"dbConnsTimeInMs":26,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Thanks Sergei
Sorry for the complaining post - I was very tired and frustrated after 48hrs of debugging and up&downgrading & testing multiple config settings => realizing that I had no way out and deciding to export and then later import all data (to downgrade) was the final heavy hit.
I'm currently downgrading back to 10.1(.33) mainly because I noticed that I had non-optimal insert-performance as well with the old 10.2.15 (when I wrote the above post I did not realize that I did not have to export+import again everything to do a minor downgrade) and as well with 10.3.8 (the major upgrade was my final hope...) => I therefore suppose that some change done in 10.2/3 is causing this.
I'll then upgrade to 10.1.35 whenever it becomes available (I need a bugfix which is included there) and then I think that I'll try to stay on 10.1.35 hoping that I don't need anything new.
Yes, I will try to simplify and reproduce the "good vs bad" performance (in VMs) and then post a bug report.
I will shut up about the memory leak for the time being as long as nobody else complains and I don't have to upgrade to 10.2/3
Cheers and thanks a lot for your help.