Details

    Description

      After the code for detecting busy ports was improved in the SST script code, the SST script for rsync sometimes began to diagnose an error associated with a busy port, which especially often happens when running some tests in parallel or when restarting quickly after failures:

      2021-05-25  7:26:53 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '127.0.0.1:16020' --datadir '/dev/shm/bb-10.4-merge/mysql-test/var/1/mysqld.5/data/' --defaults-file '/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf' --defaults-group-suffix '.5' --parent '107939' --binlog 'mysqld-bin' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.5 --defaults-file=/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300'
      WSREP_SST: [ERROR] rsync or stunnel daemon port '16020'  has been taken by another program (20210525 07:26:53.410)
      WSREP_SST: [INFO] Joiner cleanup. rsync PID: 109214 (20210525 07:26:53.412)
      /dev/shm/bb-10.4-merge/scripts/wsrep_sst_rsync: line 41: kill: (109214) - No such process
      WSREP_SST: [INFO] Joiner cleanup done. (20210525 07:26:53.415)
      2021-05-25  7:26:53 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '127.0.0.1:16020' --datadir '/dev/shm/bb-10.4-merge/mysql-test/var/1/mysqld.5/data/' --defaults-file '/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf' --defaults-group-suffix '.5' --parent '107939' --binlog 'mysqld-bin' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.5 --defaults-file=/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      	Read: '(null)'
      2021-05-25  7:26:53 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '127.0.0.1:16020' --datadir '/dev/shm/bb-10.4-merge/mysql-test/var/1/mysqld.5/data/' --defaults-file '/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf' --defaults-group-suffix '.5' --parent '107939' --binlog 'mysqld-bin' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.5 --defaults-file=/dev/shm/bb-10.4-merge/mysql-test/var/1/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300: 16 (Device or resource busy)
      2021-05-25  7:26:53 2 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
      

      Attachments

        Activity

          sysprg Julius Goryavsky added a comment - - edited

           MDEV-25818: RSYNC SST failed due to busy port
           
          This commit reduces the likelihood of getting a busy port on
          quick restarts with rsync SST (problem MDEV-25818) and fixes
          a number of other flaws in SST scripts, adds new functionality,
          and also synchronizes the xtrabackup-v2 script with the
          mariabackup script (the latter applies only to the 10.2 branch):
           
           1) SST via rsync: rsync and stunnel does not always get the right
              time to complete by correctly handling SIGTERM. These utilities
              are now given more time to complete normally (via normal SIGTERM
              processing) before we move on to using "kill -9";
           2) SST via rsync: attempts to terminate an rsync or stunnel process
              (via "kill" utility) are only made if it did not terminated on
              its own;
           3) SST via rsync: if a combination of stunnel and rsync is used,
              then we need to wait for both utilities to finish or stop, not
              just one of them;
           4) The config file and pid file for stunnel are now deleted after
              successful completion of SST on the donor node;
           5) The configs and pid files from rsync and stunnel should not be
              deleted unless these utilities succeed (or are sucessfully
              terminated) on the joiner node;
           6) The configs and pid files now excluded from transfer via rsync;
           7) Spaces in paths are now valid for config files as well (when
              used with SST via rsync or mariabackup / xtrabackup[-v2]);
           8) SST via mariabackup: added preliminary verification of keys and
              certificates that are used when establishing a connection using
              SSL (to avoid long timeouts and improve diagnostics) - by analogy
              with how it is done for the xtrabackup-v2 (plus check for CA file),
              while that check is skipped if the user does not have openssl
              installed (or does not have diff utility);
           9) Added backup-threads=<n> configuration option which adds
              "--parallel=<n>" for mariabackup / xtrabackup at backup and
              move-back stages;
          10) Added encrypt-threads and encrypt-chunk-size configuration
              options for xbcrypt management (when xbcrypt is used);
          11) Small optimization: checking the socat version and adding
              a file with parameters for 2048-bit Diffie-Hellman (if necessary)
              is done only if the user has not specified "dhparam=" in the
              "sockopt" option value;
          12) SST via rsync now supports "backup-threads" configuration option
              (in server-related sections or in the "[sst]");
          13) Determining the number of available processors is now supported
              for FreeBSD + mariabackup/xtrabackup: before that we might have
              problems with "--compact" (rebuild indexes) or qpress on FreeBSD;
          14) The check_pid() function should not raise an error state in
              the rare cases when the pid file was created, but it is empty,
              or if it is deleted right during the check, or when zero is read
              from the pid file;
          15) Iproved templates that are used to check if a requested socket
              is "listening" when using the ss utility;
          16) Shortened some other templates for socket state utilities;
          17) Temporary files created by mariabackup / xtrabackup are moved
              to a separate subdirectory inside tmpdir (so they don't get
              mixed with other temporary files, which can make debugging
              more difficult);
          18) 10.2 only: the script for SST via xtrabackup-v2 has been brought
              in full compliance with all the bugfixes made for mariabackup (as
              it previously contained many flaws compared to the updated script
              for mariabackup).
          

          10.2: https://github.com/MariaDB/server/commit/3bfbd805adf4c0504f230b673fa213ed97301e94
          10.6: https://github.com/MariaDB/server/commit/87cd77599a00c1d806c8d703e21bc4578e3e5e79

          Tests:
          http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-MDEV-25818
          http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-MDEV-25818-galera

          Galera BB (10.2):
          http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-galera

          sysprg Julius Goryavsky added a comment - - edited MDEV-25818: RSYNC SST failed due to busy port   This commit reduces the likelihood of getting a busy port on quick restarts with rsync SST (problem MDEV-25818) and fixes a number of other flaws in SST scripts, adds new functionality, and also synchronizes the xtrabackup-v2 script with the mariabackup script (the latter applies only to the 10.2 branch):   1) SST via rsync: rsync and stunnel does not always get the right time to complete by correctly handling SIGTERM. These utilities are now given more time to complete normally (via normal SIGTERM processing) before we move on to using "kill -9"; 2) SST via rsync: attempts to terminate an rsync or stunnel process (via "kill" utility) are only made if it did not terminated on its own; 3) SST via rsync: if a combination of stunnel and rsync is used, then we need to wait for both utilities to finish or stop, not just one of them; 4) The config file and pid file for stunnel are now deleted after successful completion of SST on the donor node; 5) The configs and pid files from rsync and stunnel should not be deleted unless these utilities succeed (or are sucessfully terminated) on the joiner node; 6) The configs and pid files now excluded from transfer via rsync; 7) Spaces in paths are now valid for config files as well (when used with SST via rsync or mariabackup / xtrabackup[-v2]); 8) SST via mariabackup: added preliminary verification of keys and certificates that are used when establishing a connection using SSL (to avoid long timeouts and improve diagnostics) - by analogy with how it is done for the xtrabackup-v2 (plus check for CA file), while that check is skipped if the user does not have openssl installed (or does not have diff utility); 9) Added backup-threads=<n> configuration option which adds "--parallel=<n>" for mariabackup / xtrabackup at backup and move-back stages; 10) Added encrypt-threads and encrypt-chunk-size configuration options for xbcrypt management (when xbcrypt is used); 11) Small optimization: checking the socat version and adding a file with parameters for 2048-bit Diffie-Hellman (if necessary) is done only if the user has not specified "dhparam=" in the "sockopt" option value; 12) SST via rsync now supports "backup-threads" configuration option (in server-related sections or in the "[sst]"); 13) Determining the number of available processors is now supported for FreeBSD + mariabackup/xtrabackup: before that we might have problems with "--compact" (rebuild indexes) or qpress on FreeBSD; 14) The check_pid() function should not raise an error state in the rare cases when the pid file was created, but it is empty, or if it is deleted right during the check, or when zero is read from the pid file; 15) Iproved templates that are used to check if a requested socket is "listening" when using the ss utility; 16) Shortened some other templates for socket state utilities; 17) Temporary files created by mariabackup / xtrabackup are moved to a separate subdirectory inside tmpdir (so they don't get mixed with other temporary files, which can make debugging more difficult); 18) 10.2 only: the script for SST via xtrabackup-v2 has been brought in full compliance with all the bugfixes made for mariabackup (as it previously contained many flaws compared to the updated script for mariabackup). 10.2: https://github.com/MariaDB/server/commit/3bfbd805adf4c0504f230b673fa213ed97301e94 10.6: https://github.com/MariaDB/server/commit/87cd77599a00c1d806c8d703e21bc4578e3e5e79 Tests: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-MDEV-25818 http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-MDEV-25818-galera Galera BB (10.2): http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-galera

          ok to push.

          jplindst Jan Lindström (Inactive) added a comment - ok to push.
          sysprg Julius Goryavsky added a comment - fixed & merged, https://github.com/MariaDB/server/commit/2fb4407827ecd6cbf52e210a8d9370b4560ddd5b

          People

            sysprg Julius Goryavsky
            sysprg Julius Goryavsky
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.