We have a cluster of 3 nodes, one of which gets the lions share of requests and runs some aggregation and reporting queries every morning. We have noticed that recent version (not sure when it really started to be an issue) seem to leak memory which is then terminated by OOM killer. We did tame OOM a bit but it genuinely runs out of memory.
Looking at the graph below, we can see that the memory consumed each day is not given back and accumulates until it stops.
[server]
|
|
[mysqld]
|
|
basedir = /usr
|
bind-address = 0.0.0.0
|
bulk_insert_buffer_size = 16M
|
character-set-server = utf8mb4
|
collation-server = utf8mb4_general_ci
|
connect_timeout = 5
|
datadir = /var/lib/mysql
|
default_storage_engine = InnoDB
|
expire_logs_days = 3
|
group_concat_max_len = 1000000
|
|
innodb_buffer_pool_instances = 5
|
innodb_buffer_pool_size = 5G
|
innodb_change_buffer_max_size = 50
|
innodb_file_format = Barracuda
|
innodb_file_per_table = 1
|
innodb_flush_method = O_DIRECT
|
innodb_io_capacity = 400
|
innodb_large_prefix = on
|
innodb_log_buffer_size = 1G
|
innodb_log_file_size = 640M
|
innodb_log_files_in_group = 2
|
innodb_open_files = 400
|
innodb_read_io_threads = 16
|
innodb_thread_concurrency = 8
|
innodb_write_io_threads = 16
|
|
join_buffer_size = 2M
|
#key_buffer_size = 16M
|
lc-messages-dir = /usr/share/mysql
|
log_bin = /var/log/mysql/mariadb-bin
|
log_bin_index = /var/log/mysql/mariadb-bin.index
|
log_error = /var/log/mysql/error.log
|
# https://docs.rackspace.com/support/how-to/mariadb-error-log-configuration/
|
log_warnings = 1
|
|
max_allowed_packet = 256M
|
max_binlog_size = 512M
|
max_connections = 400
|
max_heap_table_size = 64M
|
myisam_recover_options = BACKUP
|
performance_schema = on
|
pid-file = /var/run/mysqld/mysqld.pid
|
port = 3306
|
query_cache_limit = 128K
|
query_cache_size = 64M
|
skip-external-locking
|
skip-name-resolve = 1
|
socket = /var/run/mysqld/mysqld.sock
|
sort_buffer_size = 4M
|
table_definition_cache = 16384
|
table_open_cache = 16384
|
thread_cache_size = 128
|
thread_cache_size = 8
|
thread_stack = 192K
|
tmp_table_size = 256M
|
tmpdir = /tmp
|
user = mysql
|
userstat = 1
|
wait_timeout = 600
|
|
[embedded]
|
|
[mariadb]
|
plugin_load_add = ha_federatedx
|
[mariadb-10.1]
|
|
You can see that at 6am every day it takes a bite of memory and never releases it.
During that time, there are many rows updated. We use it for an inventory system where it crunches the previous days opening stock, then all the relevant transactions to produce a closing stock - ultimately though this is just a bunch of basic joins that inserts into a table. 1.9M rows per day and no fancy statements all just select / inserts. I would have thought that at no point should selects and inserts no matter how they were written would cause a memory leak over time.
{"report":{"fcp":980.7000000476837,"ttfb":231.39999985694885,"pageVisibility":"visible","entityId":132147,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"3dd92038-4a58-4b7c-873c-33ee6acfd1e5","navigationType":0,"readyForUser":1089,"redirectCount":0,"resourceLoadedEnd":679.8999998569489,"resourceLoadedStart":239.39999985694885,"resourceTiming":[{"duration":42.10000014305115,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":239.39999985694885,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":239.39999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":281.5,"responseStart":0,"secureConnectionStart":0},{"duration":42.19999980926514,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":239.70000004768372,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":239.70000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":281.89999985694885,"responseStart":0,"secureConnectionStart":0},{"duration":238.20000004768372,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":239.89999985694885,"connectEnd":239.89999985694885,"connectStart":239.89999985694885,"domainLookupEnd":239.89999985694885,"domainLookupStart":239.89999985694885,"fetchStart":239.89999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":285,"responseEnd":478.09999990463257,"responseStart":311.59999990463257,"secureConnectionStart":239.89999985694885},{"duration":398.7000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":240.09999990463257,"connectEnd":240.09999990463257,"connectStart":240.09999990463257,"domainLookupEnd":240.09999990463257,"domainLookupStart":240.09999990463257,"fetchStart":240.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":285.2999999523163,"responseEnd":638.7999999523163,"responseStart":317.59999990463257,"secureConnectionStart":240.09999990463257},{"duration":84.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":240.29999995231628,"connectEnd":240.29999995231628,"connectStart":240.29999995231628,"domainLookupEnd":240.29999995231628,"domainLookupStart":240.29999995231628,"fetchStart":240.29999995231628,"redirectEnd":0,"redirectStart":0,"requestStart":285.7000000476837,"responseEnd":325.2000000476837,"responseStart":324.2000000476837,"secureConnectionStart":240.29999995231628},{"duration":78.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":240.59999990463257,"connectEnd":240.59999990463257,"connectStart":240.59999990463257,"domainLookupEnd":240.59999990463257,"domainLookupStart":240.59999990463257,"fetchStart":240.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":285.7999999523163,"responseEnd":319.5,"responseStart":315.5,"secureConnectionStart":240.59999990463257},{"duration":75.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":240.70000004768372,"connectEnd":240.70000004768372,"connectStart":240.70000004768372,"domainLookupEnd":240.70000004768372,"domainLookupStart":240.70000004768372,"fetchStart":240.70000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":286.09999990463257,"responseEnd":316.39999985694885,"responseStart":312.89999985694885,"secureConnectionStart":240.70000004768372},{"duration":42.80000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":240.89999985694885,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":240.89999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":283.7000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":75.39999985694885,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":241.20000004768372,"connectEnd":241.20000004768372,"connectStart":241.20000004768372,"domainLookupEnd":241.20000004768372,"domainLookupStart":241.20000004768372,"fetchStart":241.20000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":286.09999990463257,"responseEnd":316.59999990463257,"responseStart":313.39999985694885,"secureConnectionStart":241.20000004768372},{"duration":42.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":241.39999985694885,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":241.39999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":283.89999985694885,"responseStart":0,"secureConnectionStart":0},{"duration":85.60000014305115,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":241.59999990463257,"connectEnd":241.59999990463257,"connectStart":241.59999990463257,"domainLookupEnd":241.59999990463257,"domainLookupStart":241.59999990463257,"fetchStart":241.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":286.2999999523163,"responseEnd":327.2000000476837,"responseStart":313.89999985694885,"secureConnectionStart":241.59999990463257},{"duration":428.89999985694885,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":251,"connectEnd":251,"connectStart":251,"domainLookupEnd":251,"domainLookupStart":251,"fetchStart":251,"redirectEnd":0,"redirectStart":0,"requestStart":416.5,"responseEnd":679.8999998569489,"responseStart":678,"secureConnectionStart":251},{"duration":412.7000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":251.5,"connectEnd":251.5,"connectStart":251.5,"domainLookupEnd":251.5,"domainLookupStart":251.5,"fetchStart":251.5,"redirectEnd":0,"redirectStart":0,"requestStart":464.09999990463257,"responseEnd":664.2000000476837,"responseStart":662,"secureConnectionStart":251.5},{"duration":78.89999985694885,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":623,"connectEnd":623,"connectStart":623,"domainLookupEnd":623,"domainLookupStart":623,"fetchStart":623,"redirectEnd":0,"redirectStart":0,"requestStart":654.8999998569489,"responseEnd":701.8999998569489,"responseStart":701.2999999523163,"secureConnectionStart":623},{"duration":315.69999980926514,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":889.7000000476837,"connectEnd":889.7000000476837,"connectStart":889.7000000476837,"domainLookupEnd":889.7000000476837,"domainLookupStart":889.7000000476837,"fetchStart":889.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":1171.3999998569489,"responseEnd":1205.3999998569489,"responseStart":1204.7999999523163,"secureConnectionStart":889.7000000476837}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":29,"responseStart":232,"responseEnd":251,"domLoading":235,"domInteractive":1171,"domContentLoadedEventStart":1171,"domContentLoadedEventEnd":1243,"domComplete":1540,"loadEventStart":1540,"loadEventEnd":1540,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1141.7999999523163},{"name":"bigPipe.sidebar-id.end","time":1142.7000000476837},{"name":"bigPipe.activity-panel-pipe-id.start","time":1142.7999999523163},{"name":"bigPipe.activity-panel-pipe-id.end","time":1144.8999998569489},{"name":"activityTabFullyLoaded","time":1265.7000000476837}],"measures":[],"correlationId":"612b37b2521a29","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":120,"dbReadsTimeInMs":16,"dbConnsTimeInMs":26,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
I think this is correct. The memory even gets freed but unusable until its all unusable.
I do have libre nms which was monitoring the database nodes and it does some memory logging for mariadb specifically and that appears to show a relatively flat allocation which is what we expect to see, but its not showing me the full memory load of the mysql process which I think is what we wanted to see.
You can see though that jemalloc makes quite a big difference, I suspect that fragmentation will never be completely eliminated but its substantially better. We do thrash the database hard on that node for the reporting and we have another which contains retail till records for 350 shops which is also very very busy which suffered the same fate every month or so regardless of how much memory you give it. I have changed that one too. That one would be out of memory when doing a mysqldump so no related to any modification of data at all.