[MDEV-14394] Nodes can't join with socat Address already in use Created: 2017-11-14  Updated: 2020-08-25  Resolved: 2017-12-18

Status: Closed
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 10.1.28
Fix Version/s: 10.1.30

Type: Bug Priority: Critical
Reporter: Claudio Nanni Assignee: Sergei Golubchik
Resolution: Fixed Votes: 1
Labels: None
Environment:

Ubuntu Xenial


Issue Links:
Relates
relates to MDEV-10442 "Address already in use" on restart Closed
Sprint: 10.1.30

 Description   

Hi,
Experiencing the same problem as MDEV-10442, nodes can't join on Ubuntu Xenial.

mysqld stops with:

2017-11-14 16:11:29 140304279660288 [Note] WSREP: Member 0.0 (Galera_Node_02) requested state transfer from '*any*'. Selected 1.0 (Galera_Node_01)(SYNCED) as donor.
2017-11-14 16:11:29 140304279660288 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 786790589)
2017-11-14 16:11:29 140322124806912 [Note] WSREP: Requesting state transfer: success, donor: 1
2017-11-14 16:11:29 140322124806912 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(709f570f-c7d5-11e7-b709-e25129d5bf65:13)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20171114 16:11:29.639)
2017/11/14 16:11:29 socat[26364] E bind(6, {AF=2 0.0.0.0:4444}, 16): Address already in use
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 1 0 (20171114 16:11:29.645)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20171114 16:11:29.647)
2017-11-14 16:11:29 140304250304256 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.30.2.88' --datadir '/mnt/data/mysql/'   --parent '26044' --binlog 'mysqld-bin' : 32 (Broken pipe)
2017-11-14 16:11:29 140304250304256 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2017-11-14 16:11:29 140322125121792 [ERROR] WSREP: SST failed: 32 (Broken pipe)
2017-11-14 16:11:29 140322125121792 [ERROR] Aborting

and socat, xbstream, wsrep_sst_xtrabackup-v2 keep running, parenthood inherited by 'init'.
As soon as you kill all the sst processes systemd restarts mysqld and the problems is back, mysqld stops, sst processes keep running.

10.1.28 seems not to contain the fix from MDEV-10442:

In sql/wsrep_utils.cc:

}
 
    err_ = posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETSIGDEF  |
                                            POSIX_SPAWN_SETSIGMASK |
            /* start a new process group */ POSIX_SPAWN_SETPGROUP  |
                                            POSIX_SPAWN_USEVFORK);
    if (err_)
    {



 Comments   
Comment by Andrii Nikitin (Inactive) [ 2017-11-15 ]

As you already found out - it was decided to fix MDEV-10442 only in 10.2 .
I will reassign to serg to somehow decide whether it is risky to fix it in 10.1 as well or whether it will be closed as Duplicate/Will not fix.

Comment by Sergei Golubchik [ 2017-11-16 ]

let's fix it. I don't see why not.

Generated at Thu Feb 08 08:13:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.