|
Reproducible on 10.0-galera commit 3eb8bc01b6876dc9dbacb82179127a58f4b86e79
I don't know whether it's a real permanent leak, or temporary growth and it will get to normal later, or it is filling up some wsrep-related buffer and it will stop when the buffer is full, but the difference between a galera node (even in a one-node cluster) and the same binary running in s standalone mode, without wsrep, is striking:
|
The node soon after replication start
|
27971 921120 106740 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
|
...
|
The same node and another slave which was started a bit later
|
27971 1297952 345904 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
|
28300 715204 83748 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
|
...
|
The node keeps growing, the other slave has only grown a little
|
27971 1322528 361600 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
|
28300 719300 116052 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
|
...
|
The node keeps growing, the other slave stopped growing
|
27971 2190880 893276 sql/mysqld --datadir=data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=/home/elenst/git/10.0-galera --port=8306 --loose-lc-messages-dir=sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
|
28300 719300 119176 sql/mysqld --no-defaults --basedir=/home/elenst/git/10.0-galera --datadir=data2 --log-error=data2/log.err --loose-lc-messages-dir=sql/share --loose-language=sql/share/english --port=3307 --socket=data2/tmp/mysql.sock --tmpdir=data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
|
... and counting.
Also, the replication on the node is incredibly slow:
Exec_Master_Log_Pos: 21200089
|
Relay_Log_Space: 1692002518
|
Seconds_Behind_Master: 14566
|
The other slave caught up with the master hours ago.
|
|
For a note, I left it running, and after 1 day it still hasn't caught up with the master, and it's 11 Gb already:
27971 11071008 5711952 sql/mysqld ...
|
Read_Master_Log_Pos: 618258865
|
...
|
Exec_Master_Log_Pos: 169162951
|
...
|
Seconds_Behind_Master: 82883
|
|
|
3 local servers, all running on 10.0-galera tree, startup options as below (even for those that don't say "--no-defaults" explicitly there are no cnf files to pick up).
Topology is M=>S1, M=>S2 where "=>" stands for the regular async replication.
S1 (pid 27971) is a node in a single-node cluster (basically, a standalone server but started with wsrep* options as below, it shows the cluster size 1).
S2 (pid 28300) is a standalone server started without wsrep* options, otherwise seemingly the same unless I missed something.
M (pid 28032) is a standalone server started without wsrep* options
Executed on both slaves:
change master to master_host='127.0.0.1', master_port=3306, master_user='root';
|
start slave;
|
Master has run the slap flow as suggested in the description (several mln queries total). It all went reasonably fast, and it's been idle ever since.
S2 caught up with the master quite fast (was already idle by the time I posted my first comment).
S1 is still replicatiing.
Here is the current ps:
elenst 27971 2.8 69.9 11230752 5732500 pts/0 Sl Nov22 40:54 10.0-galera/sql/mysqld --datadir=10.0-galera/data1 --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=10.0-galera --port=8306 --loose-lc-messages-dir=10.0-galera/sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=10.0-galera/data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --server-id=1 --core --log-bin=master-bin --binlog-format=row --log-bin=master-bin --log-slave-updates
|
elenst 28032 3.9 0.9 715496 75844 pts/0 Sl Nov22 56:35 10.0-galera/sql/mysqld --no-defaults --basedir=10.0-galera --datadir=10.0-galera/data --log-error=10.0-galera/data/log.err --loose-lc-messages-dir=10.0-galera/sql/share --loose-language=10.0-galera/sql/share/english --port=3306 --socket=10.0-galera/data/tmp/mysql.sock --tmpdir=10.0-galera/data/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=100
|
elenst 28300 4.3 1.4 719300 117480 pts/0 Sl Nov22 59:29 10.0-galera/sql/mysqld --no-defaults --basedir=10.0-galera --datadir=10.0-galera/data2 --log-error=10.0-galera/data2/log.err --loose-lc-messages-dir=10.0-galera/sql/share --loose-language=10.0-galera/sql/share/english --port=3307 --socket=10.0-galera/data2/tmp/mysql.sock --tmpdir=10.0-galera/data2/tmp --loose-core --log-bin --binlog-format=row --log-slave-updates --server-id=200 --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0
|
Before that, I tried it with a 2-node cluster, M=>S1, S1<->S2 where M was a standalone server while S1 and S2 were Galera nodes, "=>" stands for the async replication and "<->" stands for Galera replication, and I saw the signs of the same memory growth as described, but I stopped it after S1 hit 1,5G, so I don't know if it would keep growing, and I didn't check the speed of async replication back then.
|