An async slave node clearly leaks memory on some DDL statements coming from async master (also MariaDB 10).
Steps to reproduce:
– node4 is async master to node1, node1 and node2 are Galera cluster members. Run this simple query multiple times:
root@node4:~# mysqlslap --delimiter=";" --number-of-queries=100000 --create-schema=test --query="CREATE TABLE IF NOT EXISTS xxxx (id int)"
|
(...)
– the more times the above query is executed, the bigger memory usage difference between the nodes and the more clear the memory leak is:
node1 {root} ((none)) > select @@version,@@version_comment;
|
+------------------------------------+-------------------------------------------------------+
|
| @@version | @@version_comment |
|
+------------------------------------+-------------------------------------------------------+
|
| 10.0.22-MariaDB-1~wheezy-wsrep-log | mariadb.org binary distribution, wsrep_25.11.r21a2415 |
|
+------------------------------------+-------------------------------------------------------+
|
1 row in set (0.00 sec)
|
node1 {root} ((none)) > show status like 'ws%version';
|
+------------------------+------------+
|
| Variable_name | Value |
|
+------------------------+------------+
|
| wsrep_protocol_version | 7 |
|
| wsrep_provider_version | 3.9(rXXXX) |
|
+------------------------+------------+
|
2 rows in set (0.00 sec)
|
node1 {root} ((none)) > \! ps aux|egrep 'RSS|mysqld '|grep -v grep
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
mysql 10140 2.0 66.2 2946844 1365228 ? Sl Nov21 10:18 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep_provider=/usr/lib/galera/libgalera_smm.so --log-error=/var/
|
node1 {root} ((none)) > show global status like 'mem%';
|
+---------------+-----------+
|
| Variable_name | Value |
|
+---------------+-----------+
|
| Memory_used | 274914112 |
|
+---------------+-----------+
|
1 row in set (0.01 sec)
|
– normal cluster node:
node2 {root} ((none)) > \! ps aux|egrep 'RSS|mysqld '|grep -v grep
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
mysql 6781 0.4 26.9 1066696 270304 pts/1 Sl Nov19 16:29 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep_provider=/usr/lib/galera/libgalera_smm.so --log-error=/var/
|
node2 {root} ((none)) > show global status like 'mem%';
|
+---------------+-----------+
|
| Variable_name | Value |
|
+---------------+-----------+
|
| Memory_used | 274792608 |
|
+---------------+-----------+
|
1 row in set (0.00 sec)
|
– node1 shutdown sequence diagnostic output:
151122 0:50:06 [Note] Slave I/O thread killed while reading event
|
151122 0:50:06 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000005', position 34245035
|
151122 0:50:06 [Note] WSREP: dtor state: CLOSED
|
151122 0:50:06 [Note] WSREP: mon: entered 240000 oooe fraction 0 oool fraction 0
|
151122 0:50:07 [Note] WSREP: mon: entered 240000 oooe fraction 0 oool fraction 0
|
151122 0:50:07 [Note] WSREP: mon: entered 241908 oooe fraction 0 oool fraction 0
|
151122 0:50:07 [Note] WSREP: cert index usage at exit 0
|
151122 0:50:07 [Note] WSREP: cert trx map usage at exit 96
|
151122 0:50:07 [Note] WSREP: deps set usage at exit 0
|
151122 0:50:07 [Note] WSREP: avg deps dist 1
|
151122 0:50:07 [Note] WSREP: avg cert interval 0
|
151122 0:50:07 [Note] WSREP: cert index size 2
|
151122 0:50:07 [Note] WSREP: Service thread queue flushed.
|
151122 0:50:07 [Note] WSREP: wsdb trx map usage 240001 conn query map usage 1
|
151122 0:50:07 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0.499799, misses: 240097, in use: 240001, in pool: 96
|
151122 0:50:08 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
|
151122 0:50:08 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 1008270)
|
151122 0:50:08 [Note] WSREP: Flushing memory map to disk...
|
151122 0:50:09 [Note] InnoDB: FTS optimize thread exiting.
|
151122 0:50:09 [Note] InnoDB: Starting shutdown...
|
151122 0:50:10 [Note] InnoDB: Shutdown completed; log sequence number 555064446
|
151122 0:50:10 [Note] /usr/sbin/mysqld: Shutdown complete
|
– note the "in use: 240001" structures above...
– node2 shutdown diagnostics:
151122 0:52:48 [Note] WSREP: dtor state: CLOSED
|
151122 0:52:48 [Note] WSREP: mon: entered 816552 oooe fraction 0 oool fraction 0
|
151122 0:52:48 [Note] WSREP: mon: entered 816552 oooe fraction 0 oool fraction 0
|
151122 0:52:48 [Note] WSREP: mon: entered 805542 oooe fraction 0 oool fraction 1.2414e-06
|
151122 0:52:48 [Note] WSREP: cert index usage at exit 0
|
151122 0:52:48 [Note] WSREP: cert trx map usage at exit 0
|
151122 0:52:48 [Note] WSREP: deps set usage at exit 0
|
151122 0:52:48 [Note] WSREP: avg deps dist 1
|
151122 0:52:48 [Note] WSREP: avg cert interval 0
|
151122 0:52:48 [Note] WSREP: cert index size 2
|
151122 0:52:48 [Note] WSREP: Service thread queue flushed.
|
151122 0:52:48 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
|
151122 0:52:48 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
|
151122 0:52:48 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0.999226, misses: 632, in use: 0, in pool: 632
|
151122 0:52:48 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 1008270)
|
I cannot reproduce the same leak on PXC 5.6.
- causes
-
MDEV-8965
cluster node (async replication slave) high memory usage
-
-
Closed
{"report":{"fcp":1238.7000000476837,"ttfb":431.7999999523163,"pageVisibility":"visible","entityId":54252,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":32,"apdex":0.5,"journeyId":"1f2ce6d0-cd58-4e20-9b56-c672d9bc3a97","navigationType":0,"readyForUser":1361.2999999523163,"redirectCount":0,"resourceLoadedEnd":962.1000000238419,"resourceLoadedStart":437.5,"resourceTiming":[{"duration":134.10000002384186,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":437.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":437.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":571.6000000238419,"responseStart":0,"secureConnectionStart":0},{"duration":134.10000002384186,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":437.7999999523163,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":437.7999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":571.8999999761581,"responseStart":0,"secureConnectionStart":0},{"duration":401.60000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/s/fbf975c0cce4b1abf04784eeae9ba1f4-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":437.89999997615814,"connectEnd":437.89999997615814,"connectStart":437.89999997615814,"domainLookupEnd":437.89999997615814,"domainLookupStart":437.89999997615814,"fetchStart":437.89999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":574.1000000238419,"responseEnd":839.5,"responseStart":599.6000000238419,"secureConnectionStart":437.89999997615814},{"duration":523.8999999761581,"initiatorType":"script","name":"https://jira.mariadb.org/s/099b33461394b8015fc36c0a4b96e19f-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":438.2000000476837,"connectEnd":438.2000000476837,"connectStart":438.2000000476837,"domainLookupEnd":438.2000000476837,"domainLookupStart":438.2000000476837,"fetchStart":438.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":574.8999999761581,"responseEnd":962.1000000238419,"responseStart":669.8999999761581,"secureConnectionStart":438.2000000476837},{"duration":166.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/94c15bff32baef80f4096a08aceae8bc-CDN/lu2bu7/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":438.2999999523163,"connectEnd":438.2999999523163,"connectStart":438.2999999523163,"domainLookupEnd":438.2999999523163,"domainLookupStart":438.2999999523163,"fetchStart":438.2999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":575.5,"responseEnd":605.2000000476837,"responseStart":604.2999999523163,"secureConnectionStart":438.2999999523163},{"duration":168.29999995231628,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":438.5,"connectEnd":438.5,"connectStart":438.5,"domainLookupEnd":438.5,"domainLookupStart":438.5,"fetchStart":438.5,"redirectEnd":0,"redirectStart":0,"requestStart":577.7000000476837,"responseEnd":606.7999999523163,"responseStart":605.3999999761581,"secureConnectionStart":438.5},{"duration":168.39999997615814,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":438.60000002384186,"connectEnd":438.60000002384186,"connectStart":438.60000002384186,"domainLookupEnd":438.60000002384186,"domainLookupStart":438.60000002384186,"fetchStart":438.60000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":577.8999999761581,"responseEnd":607,"responseStart":605.8999999761581,"secureConnectionStart":438.60000002384186},{"duration":138.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2bu7/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":438.89999997615814,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":438.89999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":577.3999999761581,"responseStart":0,"secureConnectionStart":0},{"duration":179.10000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":439,"connectEnd":439,"connectStart":439,"domainLookupEnd":439,"domainLookupStart":439,"fetchStart":439,"redirectEnd":0,"redirectStart":0,"requestStart":579.3999999761581,"responseEnd":618.1000000238419,"responseStart":617.2000000476837,"secureConnectionStart":439},{"duration":138.89999997615814,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":439.10000002384186,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":439.10000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":578,"responseStart":0,"secureConnectionStart":0},{"duration":230.30000007152557,"initiatorType":"script","name":"https://jira.mariadb.org/s/3339d87fa2538a859872f2df449bf8d0-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":439.2999999523163,"connectEnd":439.2999999523163,"connectStart":439.2999999523163,"domainLookupEnd":439.2999999523163,"domainLookupStart":439.2999999523163,"fetchStart":439.2999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":582.5,"responseEnd":669.6000000238419,"responseStart":633.7999999523163,"secureConnectionStart":439.2999999523163},{"duration":495.7999999523163,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":445.5,"connectEnd":445.5,"connectStart":445.5,"domainLookupEnd":445.5,"domainLookupStart":445.5,"fetchStart":445.5,"redirectEnd":0,"redirectStart":0,"requestStart":823.1000000238419,"responseEnd":941.2999999523163,"responseStart":936.6000000238419,"secureConnectionStart":445.5},{"duration":496.39999997615814,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":445.60000002384186,"connectEnd":445.60000002384186,"connectStart":445.60000002384186,"domainLookupEnd":445.60000002384186,"domainLookupStart":445.60000002384186,"fetchStart":445.60000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":831.6000000238419,"responseEnd":942,"responseStart":937.3999999761581,"secureConnectionStart":445.60000002384186},{"duration":126.30000007152557,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":972.2999999523163,"connectEnd":972.2999999523163,"connectStart":972.2999999523163,"domainLookupEnd":972.2999999523163,"domainLookupStart":972.2999999523163,"fetchStart":972.2999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":1060.8999999761581,"responseEnd":1098.6000000238419,"responseStart":1098,"secureConnectionStart":972.2999999523163},{"duration":227,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1194.2000000476837,"connectEnd":1194.2000000476837,"connectStart":1194.2000000476837,"domainLookupEnd":1194.2000000476837,"domainLookupStart":1194.2000000476837,"fetchStart":1194.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":1385.7000000476837,"responseEnd":1421.2000000476837,"responseStart":1418.7999999523163,"secureConnectionStart":1194.2000000476837},{"duration":207.20000004768372,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":1232.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1232.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1439.7000000476837,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":222,"responseStart":432,"responseEnd":442,"domLoading":436,"domInteractive":1441,"domContentLoadedEventStart":1441,"domContentLoadedEventEnd":1500,"domComplete":2006,"loadEventStart":2006,"loadEventEnd":2006,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1406.2999999523163},{"name":"bigPipe.sidebar-id.end","time":1407.1000000238419},{"name":"bigPipe.activity-panel-pipe-id.start","time":1407.2000000476837},{"name":"bigPipe.activity-panel-pipe-id.end","time":1410.2999999523163},{"name":"activityTabFullyLoaded","time":1517.8999999761581}],"measures":[],"correlationId":"abec1efd1b6b51","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":113,"dbReadsTimeInMs":12,"dbConnsTimeInMs":20,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Reproducible on 10.0-galera commit 3eb8bc01b6876dc9dbacb82179127a58f4b86e79
I don't know whether it's a real permanent leak, or temporary growth and it will get to normal later, or it is filling up some wsrep-related buffer and it will stop when the buffer is full, but the difference between a galera node (even in a one-node cluster) and the same binary running in s standalone mode, without wsrep, is striking:
The node soon after replication start
27971 921120 106740 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
...
The same node and another slave which was started a bit later
27971 1297952 345904 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
28300 715204 83748 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
...
The node keeps growing, the other slave has only grown a little
27971 1322528 361600 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
28300 719300 116052 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
...
The node keeps growing, the other slave stopped growing
27971 2190880 893276 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
28300 719300 119176 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
... and counting.
Also, the replication on the node is incredibly slow:
Exec_Master_Log_Pos: 21200089
Relay_Log_Space: 1692002518
Seconds_Behind_Master: 14566
The other slave caught up with the master hours ago.