[MDEV-25818] RSYNC SST failed due to busy port Created: 2021-05-29  Updated: 2021-06-01  Resolved: 2021-05-31

Status: Closed
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.6.2, 10.2.39, 10.3.30, 10.4.20, 10.5.11

Type: Bug Priority: Major
Reporter: Julius Goryavsky Assignee: Julius Goryavsky
Resolution: Fixed Votes: 0
Labels: None


 Description   

After the code for detecting busy ports was improved in the SST script code, the SST script for rsync sometimes began to diagnose an error associated with a busy port, which especially often happens when running some tests in parallel or when restarting quickly after failures:

2021-05-25  7:26:53 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '127.0.0.1:16020' --datadir '/dev/shm/bb-10.4-merge/mysql-test/var/1/mysqld.5/data/' --defaults-file '/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf' --defaults-group-suffix '.5' --parent '107939' --binlog 'mysqld-bin' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.5 --defaults-file=/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300'
WSREP_SST: [ERROR] rsync or stunnel daemon port '16020'  has been taken by another program (20210525 07:26:53.410)
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 109214 (20210525 07:26:53.412)
/dev/shm/bb-10.4-merge/scripts/wsrep_sst_rsync: line 41: kill: (109214) - No such process
WSREP_SST: [INFO] Joiner cleanup done. (20210525 07:26:53.415)
2021-05-25  7:26:53 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '127.0.0.1:16020' --datadir '/dev/shm/bb-10.4-merge/mysql-test/var/1/mysqld.5/data/' --defaults-file '/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf' --defaults-group-suffix '.5' --parent '107939' --binlog 'mysqld-bin' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.5 --defaults-file=/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
	Read: '(null)'
2021-05-25  7:26:53 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '127.0.0.1:16020' --datadir '/dev/shm/bb-10.4-merge/mysql-test/var/1/mysqld.5/data/' --defaults-file '/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf' --defaults-group-suffix '.5' --parent '107939' --binlog 'mysqld-bin' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.5 --defaults-file=/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300: 16 (Device or resource busy)
2021-05-25  7:26:53 2 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.



 Comments   
Comment by Julius Goryavsky [ 2021-05-30 ]

 MDEV-25818: RSYNC SST failed due to busy port
 
This commit reduces the likelihood of getting a busy port on
quick restarts with rsync SST (problem MDEV-25818) and fixes
a number of other flaws in SST scripts, adds new functionality,
and also synchronizes the xtrabackup-v2 script with the
mariabackup script (the latter applies only to the 10.2 branch):
 
 1) SST via rsync: rsync and stunnel does not always get the right
    time to complete by correctly handling SIGTERM. These utilities
    are now given more time to complete normally (via normal SIGTERM
    processing) before we move on to using "kill -9";
 2) SST via rsync: attempts to terminate an rsync or stunnel process
    (via "kill" utility) are only made if it did not terminated on
    its own;
 3) SST via rsync: if a combination of stunnel and rsync is used,
    then we need to wait for both utilities to finish or stop, not
    just one of them;
 4) The config file and pid file for stunnel are now deleted after
    successful completion of SST on the donor node;
 5) The configs and pid files from rsync and stunnel should not be
    deleted unless these utilities succeed (or are sucessfully
    terminated) on the joiner node;
 6) The configs and pid files now excluded from transfer via rsync;
 7) Spaces in paths are now valid for config files as well (when
    used with SST via rsync or mariabackup / xtrabackup[-v2]);
 8) SST via mariabackup: added preliminary verification of keys and
    certificates that are used when establishing a connection using
    SSL (to avoid long timeouts and improve diagnostics) - by analogy
    with how it is done for the xtrabackup-v2 (plus check for CA file),
    while that check is skipped if the user does not have openssl
    installed (or does not have diff utility);
 9) Added backup-threads=<n> configuration option which adds
    "--parallel=<n>" for mariabackup / xtrabackup at backup and
    move-back stages;
10) Added encrypt-threads and encrypt-chunk-size configuration
    options for xbcrypt management (when xbcrypt is used);
11) Small optimization: checking the socat version and adding
    a file with parameters for 2048-bit Diffie-Hellman (if necessary)
    is done only if the user has not specified "dhparam=" in the
    "sockopt" option value;
12) SST via rsync now supports "backup-threads" configuration option
    (in server-related sections or in the "[sst]");
13) Determining the number of available processors is now supported
    for FreeBSD + mariabackup/xtrabackup: before that we might have
    problems with "--compact" (rebuild indexes) or qpress on FreeBSD;
14) The check_pid() function should not raise an error state in
    the rare cases when the pid file was created, but it is empty,
    or if it is deleted right during the check, or when zero is read
    from the pid file;
15) Iproved templates that are used to check if a requested socket
    is "listening" when using the ss utility;
16) Shortened some other templates for socket state utilities;
17) Temporary files created by mariabackup / xtrabackup are moved
    to a separate subdirectory inside tmpdir (so they don't get
    mixed with other temporary files, which can make debugging
    more difficult);
18) 10.2 only: the script for SST via xtrabackup-v2 has been brought
    in full compliance with all the bugfixes made for mariabackup (as
    it previously contained many flaws compared to the updated script
    for mariabackup).

10.2: https://github.com/MariaDB/server/commit/3bfbd805adf4c0504f230b673fa213ed97301e94
10.6: https://github.com/MariaDB/server/commit/87cd77599a00c1d806c8d703e21bc4578e3e5e79

Tests:
http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-MDEV-25818
http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-MDEV-25818-galera

Galera BB (10.2):
http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-galera

Comment by Jan Lindström (Inactive) [ 2021-05-31 ]

ok to push.

Comment by Julius Goryavsky [ 2021-05-31 ]

fixed & merged, https://github.com/MariaDB/server/commit/2fb4407827ecd6cbf52e210a8d9370b4560ddd5b

Generated at Thu Feb 08 09:40:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.