Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.0.22-galera
-
None
-
cat /etc/debian_version
7.9
root@node1:~# uname -a
Linux node1 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64 GNU/Linux
-
10.0.23
Description
An async slave node clearly leaks memory on some DDL statements coming from async master (also MariaDB 10).
Steps to reproduce:
– node4 is async master to node1, node1 and node2 are Galera cluster members. Run this simple query multiple times:
root@node4:~# mysqlslap --delimiter=";" --number-of-queries=100000 --create-schema=test --query="CREATE TABLE IF NOT EXISTS xxxx (id int)"
|
(...)
– the more times the above query is executed, the bigger memory usage difference between the nodes and the more clear the memory leak is:
node1 {root} ((none)) > select @@version,@@version_comment;
|
+------------------------------------+-------------------------------------------------------+
|
| @@version | @@version_comment |
|
+------------------------------------+-------------------------------------------------------+
|
| 10.0.22-MariaDB-1~wheezy-wsrep-log | mariadb.org binary distribution, wsrep_25.11.r21a2415 |
|
+------------------------------------+-------------------------------------------------------+
|
1 row in set (0.00 sec)
|
node1 {root} ((none)) > show status like 'ws%version';
|
+------------------------+------------+
|
| Variable_name | Value |
|
+------------------------+------------+
|
| wsrep_protocol_version | 7 |
|
| wsrep_provider_version | 3.9(rXXXX) |
|
+------------------------+------------+
|
2 rows in set (0.00 sec)
|
node1 {root} ((none)) > \! ps aux|egrep 'RSS|mysqld '|grep -v grep
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
mysql 10140 2.0 66.2 2946844 1365228 ? Sl Nov21 10:18 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep_provider=/usr/lib/galera/libgalera_smm.so --log-error=/var/
|
node1 {root} ((none)) > show global status like 'mem%';
|
+---------------+-----------+
|
| Variable_name | Value |
|
+---------------+-----------+
|
| Memory_used | 274914112 |
|
+---------------+-----------+
|
1 row in set (0.01 sec)
|
– normal cluster node:
node2 {root} ((none)) > \! ps aux|egrep 'RSS|mysqld '|grep -v grep
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
mysql 6781 0.4 26.9 1066696 270304 pts/1 Sl Nov19 16:29 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep_provider=/usr/lib/galera/libgalera_smm.so --log-error=/var/
|
node2 {root} ((none)) > show global status like 'mem%';
|
+---------------+-----------+
|
| Variable_name | Value |
|
+---------------+-----------+
|
| Memory_used | 274792608 |
|
+---------------+-----------+
|
1 row in set (0.00 sec)
|
– node1 shutdown sequence diagnostic output:
151122 0:50:06 [Note] Slave I/O thread killed while reading event
|
151122 0:50:06 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000005', position 34245035
|
151122 0:50:06 [Note] WSREP: dtor state: CLOSED
|
151122 0:50:06 [Note] WSREP: mon: entered 240000 oooe fraction 0 oool fraction 0
|
151122 0:50:07 [Note] WSREP: mon: entered 240000 oooe fraction 0 oool fraction 0
|
151122 0:50:07 [Note] WSREP: mon: entered 241908 oooe fraction 0 oool fraction 0
|
151122 0:50:07 [Note] WSREP: cert index usage at exit 0
|
151122 0:50:07 [Note] WSREP: cert trx map usage at exit 96
|
151122 0:50:07 [Note] WSREP: deps set usage at exit 0
|
151122 0:50:07 [Note] WSREP: avg deps dist 1
|
151122 0:50:07 [Note] WSREP: avg cert interval 0
|
151122 0:50:07 [Note] WSREP: cert index size 2
|
151122 0:50:07 [Note] WSREP: Service thread queue flushed.
|
151122 0:50:07 [Note] WSREP: wsdb trx map usage 240001 conn query map usage 1
|
151122 0:50:07 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0.499799, misses: 240097, in use: 240001, in pool: 96
|
151122 0:50:08 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
|
151122 0:50:08 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 1008270)
|
151122 0:50:08 [Note] WSREP: Flushing memory map to disk...
|
151122 0:50:09 [Note] InnoDB: FTS optimize thread exiting.
|
151122 0:50:09 [Note] InnoDB: Starting shutdown...
|
151122 0:50:10 [Note] InnoDB: Shutdown completed; log sequence number 555064446
|
151122 0:50:10 [Note] /usr/sbin/mysqld: Shutdown complete
|
– note the "in use: 240001" structures above...
– node2 shutdown diagnostics:
151122 0:52:48 [Note] WSREP: dtor state: CLOSED
|
151122 0:52:48 [Note] WSREP: mon: entered 816552 oooe fraction 0 oool fraction 0
|
151122 0:52:48 [Note] WSREP: mon: entered 816552 oooe fraction 0 oool fraction 0
|
151122 0:52:48 [Note] WSREP: mon: entered 805542 oooe fraction 0 oool fraction 1.2414e-06
|
151122 0:52:48 [Note] WSREP: cert index usage at exit 0
|
151122 0:52:48 [Note] WSREP: cert trx map usage at exit 0
|
151122 0:52:48 [Note] WSREP: deps set usage at exit 0
|
151122 0:52:48 [Note] WSREP: avg deps dist 1
|
151122 0:52:48 [Note] WSREP: avg cert interval 0
|
151122 0:52:48 [Note] WSREP: cert index size 2
|
151122 0:52:48 [Note] WSREP: Service thread queue flushed.
|
151122 0:52:48 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
|
151122 0:52:48 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
|
151122 0:52:48 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0.999226, misses: 632, in use: 0, in pool: 632
|
151122 0:52:48 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 1008270)
|
I cannot reproduce the same leak on PXC 5.6.
Attachments
Issue Links
- causes
-
MDEV-8965 cluster node (async replication slave) high memory usage
-
- Closed
-
Reproducible on 10.0-galera commit 3eb8bc01b6876dc9dbacb82179127a58f4b86e79
I don't know whether it's a real permanent leak, or temporary growth and it will get to normal later, or it is filling up some wsrep-related buffer and it will stop when the buffer is full, but the difference between a galera node (even in a one-node cluster) and the same binary running in s standalone mode, without wsrep, is striking:
The node soon after replication start
27971 921120 106740 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
...
The same node and another slave which was started a bit later
27971 1297952 345904 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
28300 715204 83748 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
...
The node keeps growing, the other slave has only grown a little
27971 1322528 361600 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
28300 719300 116052 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
...
The node keeps growing, the other slave stopped growing
27971 2190880 893276 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
28300 719300 119176 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
... and counting.
Also, the replication on the node is incredibly slow:
Exec_Master_Log_Pos: 21200089
Relay_Log_Space: 1692002518
Seconds_Behind_Master: 14566
The other slave caught up with the master hours ago.