[MDEV-9903] wsrep_sst_rsync wrongly claims that rsync daemon port is taken Created: 2016-04-12  Updated: 2016-11-07  Resolved: 2016-10-24

Status: Closed
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 10.1.13
Fix Version/s: 10.2.3

Type: Bug Priority: Critical
Reporter: Markus Ueberall Assignee: Nirbhay Choubey (Inactive)
Resolution: Fixed Votes: 0
Labels: contribution, foundation, galera
Environment:

Ubuntu Xenial, Ubuntu Trusty


Sprint: 10.1.18

 Description   

The following wsrep_sst_rsync code fragment does not distinguish between network interfaces when checking whether the rsync daemon port is available or not:

    local port_info=$(lsof -i :$rsync_port -Pn 2>/dev/null | \
        grep "(LISTEN)")
    local is_rsync=$(echo $port_info | \
        grep -w '^rsync[[:space:]]\+'"$rsync_pid" 2>/dev/null)
 
    if [ -n "$port_info" -a -z "$is_rsync" ]; then
        wsrep_log_error "rsync daemon port '$rsync_port' has been taken"
        exit 16 # EBUSY
    fi

This makes it impossible for nodes to join a cluster if that port is already used elsewhere (in the following, nnn.nnn.nnn.nnn is a public IPv4 address):

[2016-04-12 13:26:31] root@vserver02:/var/log# wsrep_sst_rsync --role 'joiner' --address 'nnn.nnn.nnn.nnn' --datadir '/var/lib/mysql/'   --parent '9438' --binlog '/var/log/mysql/mariadb-bin'
WSREP_SST: [ERROR] rsync daemon port '4444' has been taken (20160412 13:26:51.651)
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 16190 (20160412 13:26:51.654)
WSREP_SST: [INFO] Joiner cleanup done. (20160412 13:26:52.166)
[2016-04-12 13:26:52] root@vserver02:/var/log# netstat -tulpn | grep 4444
tcp        0      0 172.16.0.4:4444         0.0.0.0:*               LISTEN      14912/stunnel4  
tcp        0      0 172.16.0.2:4444         0.0.0.0:*               LISTEN      14912/stunnel4  
tcp        0      0 172.16.0.1:4444         0.0.0.0:*               LISTEN      14912/stunnel4  

Interestingly, this does not seem to be a problem for the donor?

[2016-04-12T13:46:25] root@vserver04:/etc/stunnel# lsof -i :4444 -Pn
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
stunnel4 10705 root   20u  IPv4  43870      0t0  TCP 172.16.0.2:4444 (LISTEN)
stunnel4 10705 root   21u  IPv4  43871      0t0  TCP 172.16.0.3:4444 (LISTEN)
stunnel4 10705 root   22u  IPv4  43872      0t0  TCP 172.16.0.4:4444 (LISTEN)
[2016-04-12T13:46:27] root@vserver04:/etc/stunnel# wsrep_sst_rsync --role 'donor' --address 'nnn.nnn.nnn.nnn:4444/rsync_sst' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/'    --binlog '/var/log/mysql/mariadb-bin' --gtid '592c386a-ff50-11e5-9f2e-d69a5c76ba34:5' --gtid-domain-id '0'
flush tables



 Comments   
Comment by Sergey Vojtovich [ 2016-10-24 ]

Raising priority to critical since this task is around for a while.

Generated at Thu Feb 08 07:38:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.