[MDEV-23580] galera_3nodes.galera_ipv6_rsync_section MTR failed: WSREP_SST: [ERROR] rsync daemon port '16008' has been taken Created: 2020-08-25  Updated: 2021-10-05  Resolved: 2021-05-11

Status: Closed
Project: MariaDB Server
Component/s: Galera, Tests
Affects Version/s: 10.2.34, 10.3.25, 10.4.15, 10.5.6
Fix Version/s: 10.6.1, 10.2.39, 10.3.30, 10.4.20, 10.5.11

Type: Bug Priority: Critical
Reporter: Stepan Patryshev (Inactive) Assignee: Julius Goryavsky
Resolution: Fixed Votes: 0
Labels: None
Environment:

kvm-zyp-opensuse150-amd64


Issue Links:
Duplicate
duplicates MDEV-21770 `galera_3nodes.galera_ipv6_mariabacku... Closed
is duplicated by MDEV-24890 galera_3nodes.galera_ipv6_rsync MTR f... Closed
Problem/Incident
causes MDEV-23605 galera_3nodes.galera_vote_rejoin_mysq... Closed
causes MDEV-26172 wsrep_sst_receive_address does not pl... Open
Relates
relates to MDEV-15371 datadir must be specified in move pro... Closed
relates to MDEV-15639 sst mariabackup required datadir in m... Closed
relates to MDEV-20580 WSREP fails using stunnel Closed
relates to MDEV-24962 Galera SST innobackupex-move ignores ... Closed

 Description   

galera_3nodes.galera_ipv6_rsync_section failed on BB 10.2, BB 10.3, BB 10.4, and on BB 10.5: "WSREP_SST: [ERROR] rsync daemon port '16008' has been taken".

stdio.log:

10.2.34 0be70a1b773ce66ef1803fcce19522fd9c60c07d

galera_3nodes.galera_ipv6_rsync_section 'innodb' w2 [ fail ]
        Test ended at 2020-08-25 13:32:12
 
CURRENT_TEST: galera_3nodes.galera_ipv6_mariabackup_section
CURRENT_TEST: galera_3nodes.galera_ipv6_rsync_section
 
 
Failed to start mysqld.2
 
 
worker[2] > Restart  - using different config file
worker[2] > Restart  - using different config file
worker[2] > Restart  - using different config file
 - skipping '/dev/shm/var/2/log/galera_3nodes.galera_ipv6_rsync_section-innodb/'
 
Retrying test galera_3nodes.galera_ipv6_rsync_section, attempt(2/3)...
 
***Warnings generated in error logs during shutdown after running tests: galera_3nodes.galera_ipv6_rsync_section
 
WSREP_SST: [ERROR] rsync daemon port '16008' has been taken (20200825 13:31:57.975)
2020-08-25 13:31:58 140053909067520 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '[::1]:16008' --datadir '/dev/shm/var/2/mysqld.2/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.2' --parent '31002' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.2 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --bind-address=:: --core-file --loose-debug-sync-timeout=300
2020-08-25 13:31:58 140053909067520 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '[::1]:16008' --datadir '/dev/shm/var/2/mysqld.2/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.2' --parent '31002' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.2 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --bind-address=:: --core-file --loose-debug-sync-timeout=300: 16 (Device or resource busy)
2020-08-25 13:31:58 140054126405376 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2020-08-25 13:31:58 140054126405376 [ERROR] Aborting
Warning: Memory not freed: 1427632



 Comments   
Comment by Julius Goryavsky [ 2020-10-21 ]

This is a duplicate of MDEV-21770, corrected here: https://github.com/MariaDB/server/commit/ac3c19de6967abbaaa661f72526b233d11bd6868, tests re-enabled here: https://github.com/MariaDB/server/commit/967f11a68af9c8855621b00d839c91d8895ccbf4

Comment by Stepan Patryshev (Inactive) [ 2020-12-16 ]

Reopened it since it failed on BB 10.3 with the same error.
stdio.log:

10.3.28 71806bf37c45d5490d8382cba6a5cca6879fa0db kvm-deb-bionic-amd64

galera_3nodes.galera_ipv6_rsync_section 'innodb' w2 [ fail ]
        Test ended at 2020-12-15 14:56:12
 
CURRENT_TEST: galera_3nodes.galera_ipv6_rsync_section
 
 
Failed to start mysqld.3
mysqltest failed but provided no output
 
 
 - skipping '/dev/shm/var/2/log/galera_3nodes.galera_ipv6_rsync_section-innodb/'
 
Retrying test galera_3nodes.galera_ipv6_rsync_section, attempt(2/3)...
 
worker[2] > Restart  - not started
worker[2] > Restart  - not started
worker[2] > Restart  - not started
***Warnings generated in error logs during shutdown after running tests: galera_3nodes.galera_ipv6_rsync_section
 
WSREP_SST: [ERROR] rsync daemon port '16031' has been taken (20201215 14:53:09.389)
2020-12-15 14:53:09 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '[::1]:16031' --datadir '/dev/shm/var/2/mysqld.3/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.3' --parent '30703' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.3 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
2020-12-15 14:53:09 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '[::1]:16031' --datadir '/dev/shm/var/2/mysqld.3/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.3' --parent '30703' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.3 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300: 16 (Device or resource busy)
2020-12-15 14:53:09 2 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2020-12-15 14:53:09 2 [ERROR] Aborting
 
worker[2] mysql-test-run: WARNING: Process [mysqld.3 - pid: 387, winpid: 387, exit: 256] died after mysql-test-run waited 11.7 seconds for /dev/shm/var/2/run/mysqld.3.pid to be created.
galera_3nodes.galera_ipv6_rsync_section 'innodb' w2 [ retry-fail ]
        Test ended at 2020-12-15 14:56:28
 
CURRENT_TEST: galera_3nodes.galera_ipv6_rsync_section
 
 
Failed to start mysqld.3
mysqltest failed but provided no output
 
 
 - skipping '/dev/shm/var/2/log/galera_3nodes.galera_ipv6_rsync_section-innodb/'
 
Test galera_3nodes.galera_ipv6_rsync_section has failed 2 times, no more retries!
 
***Warnings generated in error logs during shutdown after running tests: galera_3nodes.galera_ipv6_rsync_section
 
WSREP_SST: [ERROR] rsync daemon port '16031' has been taken (20201215 14:56:14.222)
2020-12-15 14:56:14 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '[::1]:16031' --datadir '/dev/shm/var/2/mysqld.3/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.3' --parent '390' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.3 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
2020-12-15 14:56:14 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '[::1]:16031' --datadir '/dev/shm/var/2/mysqld.3/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.3' --parent '390' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.3 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300: 16 (Device or resource busy)
2020-12-15 14:56:14 2 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2020-12-15 14:56:14 2 [ERROR] Aborting
Warning: Memory not freed: 1381120

Comment by Stepan Patryshev (Inactive) [ 2020-12-16 ]

It failed also on BB 10.6, but without proper output.
stdio.log:

10.6.0 295f3e4cfb4a8f132f36d53475efc92f2487aa97 kvm-rpm-rhel8-amd64

galera_3nodes.galera_ipv6_rsync_section 'innodb' w1 [ fail ]
        Test ended at 2020-11-23 21:01:56
 
CURRENT_TEST: galera_3nodes.galera_ipv6_rsync_section
 
 
Failed to start mysqld.2
mysqltest failed but provided no output
 
 
 - saving '/dev/shm/var/1/log/galera_3nodes.galera_ipv6_rsync_section-innodb/' to '/dev/shm/var/log/galera_3nodes.galera_ipv6_rsync_section-innodb/'
 
Retrying test galera_3nodes.galera_ipv6_rsync_section, attempt(2/3)...
 
worker[1] > Restart  - not started
worker[1] > Restart  - not started
worker[1] > Restart  - not started
***Warnings generated in error logs during shutdown after running tests: galera_3nodes.galera_ipv6_rsync_section
 
2020-11-23 21:01:56 0 [ERROR] Aborting

Comment by Julius Goryavsky [ 2021-05-10 ]

https://github.com/MariaDB/server/commit/e8f73c4c41ed3d062e4feb6990760c113dc84351

This commit contains a large set of further bug fixes and improvements to SST scripts for Galera, continuing the work that was started in MDEV-24962 to make SST scripts work smoothly in different network configurations (especially using ipv6) and with different environment settings:

 1) The ipv6 addresses were incorrectly handled in the SST script
    for rsync (incorrect address substitution for establishing a
    connection, incorrect address substitution for bind, and so on);
 2) Checking the locality of the ip-address in SST scripts did not
    support ipv6 addresses (such as "[::1]"), which were falsely
    identified as non-local ip, which further did not allow running
    two SSTs on different local addresses on the same machine.
    On the other hand, this bug masked some other errors (related
    to handling ipv6 addresses);
 3) The code for checking the locality of the ip address was different
    in the SST scripts for rsync and for mysqldump, with individual
    flaws. This code is now made common and moved to wsrep_sst_common;
 4) Waiting for the start of the transport channel (socat, nc, rsync,
    stunnel) in the wait_for_listen() and check_pid_and_port() functions
    did not process ipv6 addresses correctly in all cases (not for all
    branches);
 5) Waiting for the start of the transport channel (socat, nc, rsync,
    stunnel) in the wait_for_listen() and check_pid_and_port() functions
    for some code branches could give a false positive result due to
    the textual match of prefixes in the port number and/or PID of
    the process;
 6) Waiting for the start of the transport channel (socat, nc, rsync,
    stunnel) was supported through different utilities in SST scripts
    for mariabackup and for rsync, and with various minor flaws in
    the code. Now the code is still different in these scripts, but
    it supports a common set of utilities (lsof, ss, sockstat) and
    is synchronized across patterns that used to check the output
    of  these utilities;
 7) In SST via mariabackup, the signal about readiness to receive data
    is sometimes sent too early - immediately after listen(), and not
    after accept() (which are called by socat or netcat utility).
 8) Checking availability of the some options of some utilities was
    done using the grep pattern, which easily gives false positives;
 9) Common name (CN) for local addresses, if not explicitly specified,
    is now always replaced to "localhost" to avoid the need to generate
    many separate certificates for local addresses of one machine and
    not to depend on which the local address is currently used in test
    (ipv4 or ipv6, etc.);
10) In tests galera_sst_mariabackup_encrypt_with_key_server and
    galera_sst_rsync_encrypt_with_key_server the correct certificate
    is selected to avoid commonname (CN) mismatch problems;
11) Further refactoring to protect against spaces in file names.
12) Further general refactoring to eliminate bash-specific constructs
    or to improve code readability;
13) The code for setting options for the nc (netcat) utility was
    different in different scripts for SST - now it is made identical.
14) Fixed a long time ago broken encryption with xbcrypt combined with
    mariabackup and Added mariabackup support for encryption in encrypt=1
    mode using openssl (instead of xbcrypt utility), which may be enabled
    by the new configuration file option "encypt-format=openssl" placed
    in the [sst] or in the [xtrabackup] section (and enabled by default
    if user prefers [sst] section) - this will allow us to use and to
    test this encryption mode without installing third-party utilities
    from Percona.

Comment by Jan Lindström (Inactive) [ 2021-05-10 ]

ok to push.

Comment by Julius Goryavsky [ 2021-05-11 ]

Fixed:
10.2: https://github.com/MariaDB/server/commit/8fef2b8667f30e4562ac006f992f859d1defdc0e
10.3: https://github.com/MariaDB/server/commit/7e8a89387b1faf8383bf2782d779be83c1141464
further migration of changes is possible within the framework of an ordinary merge

Generated at Thu Feb 08 09:23:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.