[MDEV-13687] MariaDB 10.2.8 galera SST fails Created: 2017-08-31  Updated: 2017-09-06  Resolved: 2017-09-06

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST
Affects Version/s: 10.2
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Michael Xu Assignee: Andrii Nikitin (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

CentOS 7.x x86_64 w/ MariaDB 10.2.8



 Description   

Two brand new servers with CentOS 7 latest version, MariaDB 10.2.8.

MBA2 - 192.168.1.249
MBA1 - 192.168.1.250

1) startup MariaDB service on MBA2, create sst user, grant root access
2) stop MariaDB service on MBA2
3) run galera_new_cluster
4) startup MariaDB service again, now I can seee

MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_clu%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 1                                    |
| wsrep_cluster_size       | 1                                    |
| wsrep_cluster_state_uuid | 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8 |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+

5) startup MariaDB service on MBA1 fails with following error messsages no matter which sst_method I choose.

  • mariabackup

.
.
.
2017-08-31 16:57:59 140565635680384 [Warning] WSREP: access file(/data/mysql/clst/gvwstate.dat) failed(No such file or directory)
2017-08-31 16:57:59 140565635680384 [Note] WSREP: restore pc from disk failed
2017-08-31 16:57:59 140565635680384 [Note] WSREP: GMCast version 0
2017-08-31 16:57:59 140565635680384 [Note] WSREP: (7d22ec57, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-08-31 16:57:59 140565635680384 [Note] WSREP: (7d22ec57, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-08-31 16:57:59 140565635680384 [Note] WSREP: EVS version 0
2017-08-31 16:57:59 140565635680384 [Note] WSREP: gcomm: connecting to group 'Galera', peer '192.168.1.249:'
2017-08-31 16:57:59 140565635680384 [Note] WSREP: (7d22ec57, 'tcp://0.0.0.0:4567') connection established to 9ba3ac27 tcp://192.168.1.249:4567
2017-08-31 16:57:59 140565635680384 [Note] WSREP: (7d22ec57, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-08-31 16:57:59 140565635680384 [Note] WSREP: declaring 9ba3ac27 at tcp://192.168.1.249:4567 stable
2017-08-31 16:57:59 140565635680384 [Note] WSREP: Node 9ba3ac27 state prim
2017-08-31 16:57:59 140565635680384 [Note] WSREP: view(view_id(PRIM,7d22ec57,8) memb {
        7d22ec57,0
        9ba3ac27,0
} joined {
} left {
} partitioned {
})
2017-08-31 16:57:59 140565635680384 [Note] WSREP: save pc into disk
2017-08-31 16:58:00 140565635680384 [Note] WSREP: gcomm: connected
2017-08-31 16:58:00 140565635680384 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-08-31 16:58:00 140565635680384 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-08-31 16:58:00 140565635680384 [Note] WSREP: Opened channel 'Galera'
2017-08-31 16:58:00 140563189843712 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2017-08-31 16:58:00 140565635680384 [Note] WSREP: Waiting for SST to complete.
2017-08-31 16:58:00 140563189843712 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 7d6f76a4-8e2a-11e7-8ba4-f25dcf018c49
2017-08-31 16:58:00 140563189843712 [Note] WSREP: STATE EXCHANGE: sent state msg: 7d6f76a4-8e2a-11e7-8ba4-f25dcf018c49
2017-08-31 16:58:00 140563189843712 [Note] WSREP: STATE EXCHANGE: got state msg: 7d6f76a4-8e2a-11e7-8ba4-f25dcf018c49 from 0 (MBA1)
2017-08-31 16:58:00 140563189843712 [Note] WSREP: STATE EXCHANGE: got state msg: 7d6f76a4-8e2a-11e7-8ba4-f25dcf018c49 from 1 (MBA2)
2017-08-31 16:58:00 140563189843712 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 7,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-08-31 16:58:00 140563189843712 [Note] WSREP: Flow-control interval: [23, 23]
2017-08-31 16:58:00 140563189843712 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
2017-08-31 16:58:00 140565427668736 [Note] WSREP: State transfer required: 
        Group state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0
        Local state: 00000000-0000-0000-0000-000000000000:-1
2017-08-31 16:58:00 140565427668736 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 8: Primary, number of nodes: 2, my index: 0, protocol version 3
2017-08-31 16:58:00 140565427668736 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-08-31 16:58:00 140563181451008 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.1.250' --datadir '/data/mysql/data/'   --parent '19302' --binlog '/data/mysql/tran/mysql-bin' '
WSREP_SST: [INFO] Streaming with xbstream (20170831 16:58:00.349)
WSREP_SST: [INFO] Using socat as streamer (20170831 16:58:00.350)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20170831 16:58:00.375)
2017-08-31 16:58:00 140565427668736 [Note] WSREP: Prepared SST request: mariabackup|192.168.1.250:4444/xtrabackup_sst//1
2017-08-31 16:58:00 140565427668736 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-31 16:58:00 140565427668736 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-08-31 16:58:00 140565427668736 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-08-31 16:58:00 140565452982016 [Note] WSREP: Service thread queue flushed.
2017-08-31 16:58:00 140565427668736 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (9ba441c2-8e29-11e7-b7e0-2f564ca18fa8): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2017-08-31 16:58:00 140563189843712 [Note] WSREP: Member 0.0 (MBA1) requested state transfer from '*any*'. Selected 1.0 (MBA2)(SYNCED) as donor.
2017-08-31 16:58:00 140563189843712 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
2017-08-31 16:58:00 140565427668736 [Note] WSREP: Requesting state transfer: success, donor: 1
2017-08-31 16:58:00 140565427668736 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0)
2017-08-31 16:58:01 140563189843712 [Warning] WSREP: 1.0 (MBA2): State transfer to 0.0 (MBA1) failed: -2 (No such file or directory)
2017-08-31 16:58:01 140563189843712 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.
2017-08-31 16:58:01 140563189843712 [Note] WSREP: gcomm: terminating thread
2017-08-31 16:58:01 140563189843712 [Note] WSREP: gcomm: joining thread
2017-08-31 16:58:01 140563189843712 [Note] WSREP: gcomm: closing backend
2017-08-31 16:58:03 140563189843712 [Note] WSREP: (7d22ec57, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-31 16:58:04 140563189843712 [Note] WSREP: (7d22ec57, 'tcp://0.0.0.0:4567') connection to peer 9ba3ac27 with addr tcp://192.168.1.249:4567 timed out, no messages seen in PT3S
2017-08-31 16:58:04 140563189843712 [Note] WSREP: (7d22ec57, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.249:4567 
2017-08-31 16:58:05 140563189843712 [Note] WSREP: (7d22ec57, 'tcp://0.0.0.0:4567') reconnecting to 9ba3ac27 (tcp://192.168.1.249:4567), attempt 0
2017-08-31 16:58:06 140563189843712 [Note] WSREP: evs::proto(7d22ec57, LEAVING, view_id(REG,7d22ec57,8)) suspecting node: 9ba3ac27
2017-08-31 16:58:06 140563189843712 [Note] WSREP: evs::proto(7d22ec57, LEAVING, view_id(REG,7d22ec57,8)) suspected node without join message, declaring inactive
2017-08-31 16:58:06 140563189843712 [Note] WSREP: view(view_id(NON_PRIM,7d22ec57,8) memb {
        7d22ec57,0
} joined {
} left {
} partitioned {
        9ba3ac27,0
})
2017-08-31 16:58:06 140563189843712 [Note] WSREP: view((empty))
2017-08-31 16:58:06 140563189843712 [Note] WSREP: gcomm: closed
2017-08-31 16:58:06 140563189843712 [Note] WSREP: /usr/sbin/mysqld: Terminated.

  • mysqldump

.
.
.
2017-08-31 17:09:00 140583527037056 [Warning] WSREP: access file(/data/mysql/clst/gvwstate.dat) failed(No such file or directory)
2017-08-31 17:09:00 140583527037056 [Note] WSREP: restore pc from disk failed
2017-08-31 17:09:00 140583527037056 [Note] WSREP: GMCast version 0
2017-08-31 17:09:00 140583527037056 [Note] WSREP: (07410f0a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-08-31 17:09:00 140583527037056 [Note] WSREP: (07410f0a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-08-31 17:09:00 140583527037056 [Note] WSREP: EVS version 0
2017-08-31 17:09:00 140583527037056 [Note] WSREP: gcomm: connecting to group 'Galera', peer '192.168.1.249:'
2017-08-31 17:09:00 140583527037056 [Note] WSREP: (07410f0a, 'tcp://0.0.0.0:4567') connection established to 9ba3ac27 tcp://192.168.1.249:4567
2017-08-31 17:09:00 140583527037056 [Note] WSREP: (07410f0a, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-08-31 17:09:01 140583527037056 [Note] WSREP: declaring 9ba3ac27 at tcp://192.168.1.249:4567 stable
2017-08-31 17:09:01 140583527037056 [Note] WSREP: Node 9ba3ac27 state prim
2017-08-31 17:09:01 140583527037056 [Note] WSREP: view(view_id(PRIM,07410f0a,32) memb {
        07410f0a,0
        9ba3ac27,0
} joined {
} left {
} partitioned {
})
2017-08-31 17:09:01 140583527037056 [Note] WSREP: save pc into disk
2017-08-31 17:09:01 140583527037056 [Note] WSREP: gcomm: connected
2017-08-31 17:09:01 140583527037056 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-08-31 17:09:01 140583527037056 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-08-31 17:09:01 140583527037056 [Note] WSREP: Opened channel 'Galera'
2017-08-31 17:09:01 140577005881088 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2017-08-31 17:09:01 140577005881088 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 078d99d8-8e2c-11e7-92cc-97ad3b2774c3
2017-08-31 17:09:01 140583527037056 [Note] Reading of all Master_info entries succeded
2017-08-31 17:09:01 140583527037056 [Note] Added new Master_info '' to hash table
2017-08-31 17:09:01 140583527037056 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.2.8-MariaDB-log'  socket: '/data/mysql/vars/mysql.sock'  port: 3306  MariaDB Server
2017-08-31 17:09:01 140577005881088 [Note] WSREP: STATE EXCHANGE: sent state msg: 078d99d8-8e2c-11e7-92cc-97ad3b2774c3
2017-08-31 17:09:01 140577005881088 [Note] WSREP: STATE EXCHANGE: got state msg: 078d99d8-8e2c-11e7-92cc-97ad3b2774c3 from 0 (MBA1)
2017-08-31 17:09:01 140577005881088 [Note] WSREP: STATE EXCHANGE: got state msg: 078d99d8-8e2c-11e7-92cc-97ad3b2774c3 from 1 (MBA2)
2017-08-31 17:09:01 140577005881088 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 31,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-08-31 17:09:01 140577005881088 [Note] WSREP: Flow-control interval: [23, 23]
2017-08-31 17:09:01 140577005881088 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
2017-08-31 17:09:01 140583073695488 [Note] WSREP: State transfer required: 
        Group state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0
        Local state: 00000000-0000-0000-0000-000000000000:-1
2017-08-31 17:09:01 140583073695488 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 32: Primary, number of nodes: 2, my index: 0, protocol version 3
2017-08-31 17:09:01 140583073695488 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-08-31 17:09:03 140583073695488 [Note] WSREP: Prepared SST request: mysqldump|192.168.1.250:3306
2017-08-31 17:09:03 140583073695488 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-31 17:09:03 140583073695488 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-08-31 17:09:03 140583073695488 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-08-31 17:09:03 140577022666496 [Note] WSREP: Service thread queue flushed.
2017-08-31 17:09:03 140583073695488 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (9ba441c2-8e29-11e7-b7e0-2f564ca18fa8): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2017-08-31 17:09:03 140577005881088 [Warning] WSREP: Member 0.0 (MBA1) requested state transfer from '192.168.1.249', but it is impossible to select State Transfer donor: No route to host
2017-08-31 17:09:03 140583073695488 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2017-08-31 17:09:03 140583073695488 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2017-08-31 17:09:03 140583073695488 [Note] WSREP: Closing send monitor...
2017-08-31 17:09:03 140583073695488 [Note] WSREP: Closed send monitor.
2017-08-31 17:09:03 140583073695488 [Note] WSREP: gcomm: terminating thread
2017-08-31 17:09:03 140583073695488 [Note] WSREP: gcomm: joining thread
2017-08-31 17:09:03 140583073695488 [Note] WSREP: gcomm: closing backend
2017-08-31 17:09:04 140583073695488 [Note] WSREP: (07410f0a, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-31 17:09:06 140583073695488 [Note] WSREP: (07410f0a, 'tcp://0.0.0.0:4567') connection to peer 9ba3ac27 with addr tcp://192.168.1.249:4567 timed out, no messages seen in PT3S
2017-08-31 17:09:06 140583073695488 [Note] WSREP: (07410f0a, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.249:4567 
2017-08-31 17:09:08 140583073695488 [Note] WSREP: (07410f0a, 'tcp://0.0.0.0:4567') reconnecting to 9ba3ac27 (tcp://192.168.1.249:4567), attempt 0
2017-08-31 17:09:08 140583073695488 [Note] WSREP: evs::proto(07410f0a, LEAVING, view_id(REG,07410f0a,32)) suspecting node: 9ba3ac27
2017-08-31 17:09:08 140583073695488 [Note] WSREP: evs::proto(07410f0a, LEAVING, view_id(REG,07410f0a,32)) suspected node without join message, declaring inactive
2017-08-31 17:09:08 140583073695488 [Note] WSREP: view(view_id(NON_PRIM,07410f0a,32) memb {
        07410f0a,0
} joined {
} left {
} partitioned {
        9ba3ac27,0
})
2017-08-31 17:09:08 140583073695488 [Note] WSREP: view((empty))
2017-08-31 17:09:08 140583073695488 [Note] WSREP: gcomm: closed
2017-08-31 17:09:08 140577005881088 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-08-31 17:09:08 140577005881088 [Note] WSREP: Flow-control interval: [16, 16]
2017-08-31 17:09:08 140577005881088 [Note] WSREP: Received NON-PRIMARY.
2017-08-31 17:09:08 140577005881088 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 0)
2017-08-31 17:09:08 140577005881088 [Note] WSREP: Received self-leave message.
2017-08-31 17:09:08 140577005881088 [Note] WSREP: Flow-control interval: [0, 0]
2017-08-31 17:09:08 140577005881088 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2017-08-31 17:09:08 140577005881088 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 0)
2017-08-31 17:09:08 140577005881088 [Note] WSREP: RECV thread exiting 0: Success
2017-08-31 17:09:08 140583073695488 [Note] WSREP: recv_thread() joined.
2017-08-31 17:09:08 140583073695488 [Note] WSREP: Closing replication queue.
2017-08-31 17:09:08 140583073695488 [Note] WSREP: Closing slave action queue.
2017-08-31 17:09:08 140583073695488 [Note] WSREP: /usr/sbin/mysqld: Terminated.

  • rsync

.
.
.
2017-08-31 17:11:15 139844106639488 [Warning] WSREP: access file(/data/mysql/clst/gvwstate.dat) failed(No such file or directory)
2017-08-31 17:11:15 139844106639488 [Note] WSREP: restore pc from disk failed
2017-08-31 17:11:15 139844106639488 [Note] WSREP: GMCast version 0
2017-08-31 17:11:15 139844106639488 [Note] WSREP: (5753d2a1, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-08-31 17:11:15 139844106639488 [Note] WSREP: (5753d2a1, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-08-31 17:11:15 139844106639488 [Note] WSREP: EVS version 0
2017-08-31 17:11:15 139844106639488 [Note] WSREP: gcomm: connecting to group 'Galera', peer '192.168.1.249:'
2017-08-31 17:11:15 139844106639488 [Note] WSREP: (5753d2a1, 'tcp://0.0.0.0:4567') connection established to 9ba3ac27 tcp://192.168.1.249:4567
2017-08-31 17:11:15 139844106639488 [Note] WSREP: (5753d2a1, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-08-31 17:11:15 139844106639488 [Note] WSREP: declaring 9ba3ac27 at tcp://192.168.1.249:4567 stable
2017-08-31 17:11:15 139844106639488 [Note] WSREP: Node 9ba3ac27 state prim
2017-08-31 17:11:15 139844106639488 [Note] WSREP: view(view_id(PRIM,5753d2a1,36) memb {
        5753d2a1,0
        9ba3ac27,0
} joined {
} left {
} partitioned {
})
2017-08-31 17:11:15 139844106639488 [Note] WSREP: save pc into disk
2017-08-31 17:11:15 139844106639488 [Note] WSREP: gcomm: connected
2017-08-31 17:11:15 139844106639488 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-08-31 17:11:15 139844106639488 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-08-31 17:11:15 139844106639488 [Note] WSREP: Opened channel 'Galera'
2017-08-31 17:11:15 139843881977600 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2017-08-31 17:11:15 139844106639488 [Note] WSREP: Waiting for SST to complete.
2017-08-31 17:11:15 139843881977600 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 57a057d2-8e2c-11e7-907c-bea4d0033439
2017-08-31 17:11:15 139843881977600 [Note] WSREP: STATE EXCHANGE: sent state msg: 57a057d2-8e2c-11e7-907c-bea4d0033439
2017-08-31 17:11:15 139843881977600 [Note] WSREP: STATE EXCHANGE: got state msg: 57a057d2-8e2c-11e7-907c-bea4d0033439 from 0 (MBA1)
2017-08-31 17:11:15 139843881977600 [Note] WSREP: STATE EXCHANGE: got state msg: 57a057d2-8e2c-11e7-907c-bea4d0033439 from 1 (MBA2)
2017-08-31 17:11:15 139843881977600 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 35,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-08-31 17:11:15 139843881977600 [Note] WSREP: Flow-control interval: [23, 23]
2017-08-31 17:11:15 139843881977600 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
2017-08-31 17:11:15 139843873449728 [Note] WSREP: State transfer required: 
        Group state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0
        Local state: 00000000-0000-0000-0000-000000000000:-1
2017-08-31 17:11:15 139843873449728 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 36: Primary, number of nodes: 2, my index: 0, protocol version 3
2017-08-31 17:11:15 139843873449728 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-08-31 17:11:15 139841652123392 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.1.250' --datadir '/data/mysql/data/'   --parent '2418' --binlog '/data/mysql/tran/mysql-bin' '
2017-08-31 17:11:15 139843873449728 [Note] WSREP: Prepared SST request: rsync|192.168.1.250:4444/rsync_sst
2017-08-31 17:11:15 139843873449728 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-31 17:11:15 139843873449728 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-08-31 17:11:15 139843873449728 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-08-31 17:11:15 139843923941120 [Note] WSREP: Service thread queue flushed.
2017-08-31 17:11:15 139843873449728 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (9ba441c2-8e29-11e7-b7e0-2f564ca18fa8): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2017-08-31 17:11:15 139843881977600 [Warning] WSREP: Member 0.0 (MBA1) requested state transfer from '192.168.1.249', but it is impossible to select State Transfer donor: No route to host
2017-08-31 17:11:15 139843873449728 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2017-08-31 17:11:15 139843873449728 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2017-08-31 17:11:15 139843873449728 [Note] WSREP: Closing send monitor...
2017-08-31 17:11:15 139843873449728 [Note] WSREP: Closed send monitor.
2017-08-31 17:11:15 139843873449728 [Note] WSREP: gcomm: terminating thread
2017-08-31 17:11:15 139843873449728 [Note] WSREP: gcomm: joining thread
2017-08-31 17:11:15 139843873449728 [Note] WSREP: gcomm: closing backend
2017-08-31 17:11:18 139843873449728 [Note] WSREP: (5753d2a1, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-31 17:11:19 139843873449728 [Note] WSREP: (5753d2a1, 'tcp://0.0.0.0:4567') connection to peer 9ba3ac27 with addr tcp://192.168.1.249:4567 timed out, no messages seen in PT3S
2017-08-31 17:11:19 139843873449728 [Note] WSREP: (5753d2a1, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.249:4567 
2017-08-31 17:11:20 139843873449728 [Note] WSREP: (5753d2a1, 'tcp://0.0.0.0:4567') reconnecting to 9ba3ac27 (tcp://192.168.1.249:4567), attempt 0
2017-08-31 17:11:21 139843873449728 [Note] WSREP: evs::proto(5753d2a1, LEAVING, view_id(REG,5753d2a1,36)) suspecting node: 9ba3ac27
2017-08-31 17:11:21 139843873449728 [Note] WSREP: evs::proto(5753d2a1, LEAVING, view_id(REG,5753d2a1,36)) suspected node without join message, declaring inactive
2017-08-31 17:11:21 139843873449728 [Note] WSREP: view(view_id(NON_PRIM,5753d2a1,36) memb {
        5753d2a1,0
} joined {
} left {
} partitioned {
        9ba3ac27,0
})
2017-08-31 17:11:21 139843873449728 [Note] WSREP: view((empty))
2017-08-31 17:11:21 139843873449728 [Note] WSREP: gcomm: closed
2017-08-31 17:11:21 139843881977600 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-08-31 17:11:21 139843881977600 [Note] WSREP: Flow-control interval: [16, 16]
2017-08-31 17:11:21 139843881977600 [Note] WSREP: Received NON-PRIMARY.
2017-08-31 17:11:21 139843881977600 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 0)
2017-08-31 17:11:21 139843881977600 [Note] WSREP: Received self-leave message.
2017-08-31 17:11:21 139843881977600 [Note] WSREP: Flow-control interval: [0, 0]
2017-08-31 17:11:21 139843881977600 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2017-08-31 17:11:21 139843881977600 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 0)
2017-08-31 17:11:21 139843881977600 [Note] WSREP: RECV thread exiting 0: Success
2017-08-31 17:11:21 139843873449728 [Note] WSREP: recv_thread() joined.
2017-08-31 17:11:21 139843873449728 [Note] WSREP: Closing replication queue.
2017-08-31 17:11:21 139843873449728 [Note] WSREP: Closing slave action queue.
2017-08-31 17:11:21 139843873449728 [Note] WSREP: /usr/sbin/mysqld: Terminated.
WSREP_SST: [ERROR] Parent mysqld process (PID:2418) terminated unexpectedly. (20170831 17:11:21.934)
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 2466 (20170831 17:11:21.936)
WSREP_SST: [INFO] Joiner cleanup done. (20170831 17:11:22.439)

galera.cnf

[galera]
wsrep_on                        = 1
wsrep_cluster_address           = "gcomm://192.168.1.249"
wsrep_cluster_name              = Galera
wsrep_gtid_domain_id            = 192
wsrep_node_address              = "192.168.1.250"
wsrep_node_name                 = MBA1
wsrep_sst_donor                 = 192.168.1.249
wsrep_sst_method                = rsync
wsrep_sst_auth                  = sst:changeme
wsrep_data_home_dir             = /data/mysql/clst
wsrep_auto_increment_control    = 1
wsrep_provider                  = /usr/lib64/galera/libgalera_smm.so
wsrep_slave_threads             = 4
wsrep_provider_options          = "gcache.size=2048M"
wsrep_debug                     = 0
wsrep_gtid_mode                 = 1
wsrep_max_ws_rows               = 0
wsrep_max_ws_size               = 2147483647
wsrep_recover                   = 0
wsrep_replicate_myisam          = 0
wsrep_sst_donor_rejects_queries = 0
wsrep_sync_wait                 = 0

And I don't any network connection problems.

[root@MBA1 clst]# telnet 192.168.1.249 3306
Trying 192.168.1.249...
Connected to 192.168.1.249.
Escape character is '^]'.
\
5.5.5-10.2.8-MariaDB-log0]bWk6h-��-��CJ(+=1|?e`3{mysql_native_password^]
telnet> quit
Connection closed.
[root@MBA1 clst]# telnet 192.168.1.249 4567
Trying 192.168.1.249...
Connected to 192.168.1.249.
Escape character is '^]'.
$�Ӕ     )��j��,纑J�Ǿ�^]
telnet> quit
Connection closed.
[root@MBA1 clst]# ping 192.168.1.249  
PING 192.168.1.249 (192.168.1.249) 56(84) bytes of data.
64 bytes from 192.168.1.249: icmp_seq=1 ttl=64 time=0.248 ms
64 bytes from 192.168.1.249: icmp_seq=2 ttl=64 time=0.278 ms
64 bytes from 192.168.1.249: icmp_seq=3 ttl=64 time=0.285 ms
^C
--- 192.168.1.249 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.248/0.270/0.285/0.020 ms



 Comments   
Comment by Andrii Nikitin (Inactive) [ 2017-08-31 ]

It looks error 113 (EHOSTUNREACH = "No route to host") will be returned if the node (192.168.1.249) is not ready to be a donor :

https://github.com/codership/galera/blob/edce1a80587dd3366180abfa1f4de653a203863a/gcs/src/gcs_group.cpp#L1206

/*!
 * Selects and returns the index of state transfer donor, if available.
 * Updates donor and joiner status if state transfer is possible
 *
 * @return
 *         donor index or negative error code:
 *         -EHOSTUNREACH if reqiested donor is not available
 *         -EAGAIN       if there were no nodes in the proper state.
 */
static int
group_select_donor (gcs_group_t* group,

Could you check outputs of SQL commands from 192.168.1.249 and re-try SST if node looks valid:

show variables like "wsrep%";
show status like "wsrep%";

See also related link reg. somewhat confusing error message https://github.com/codership/galera/issues/108

Comment by Michael Xu [ 2017-08-31 ]

STATUS:

+------------------------------+--------------------------------------+
| Variable_name                | Value                                |
+------------------------------+--------------------------------------+
| wsrep_apply_oooe             | 0.000000                             |
| wsrep_apply_oool             | 0.000000                             |
| wsrep_apply_window           | 0.000000                             |
| wsrep_causal_reads           | 0                                    |
| wsrep_cert_deps_distance     | 0.000000                             |
| wsrep_cert_index_size        | 0                                    |
| wsrep_cert_interval          | 0.000000                             |
| wsrep_cluster_conf_id        | 39                                   |
| wsrep_cluster_size           | 1                                    |
| wsrep_cluster_state_uuid     | 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8 |
| wsrep_cluster_status         | Primary                              |
| wsrep_commit_oooe            | 0.000000                             |
| wsrep_commit_oool            | 0.000000                             |
| wsrep_commit_window          | 0.000000                             |
| wsrep_connected              | ON                                   |
| wsrep_desync_count           | 0                                    |
| wsrep_evs_delayed            |                                      |
| wsrep_evs_evict_list         |                                      |
| wsrep_evs_repl_latency       | 0/0/0/0/0                            |
| wsrep_evs_state              | OPERATIONAL                          |
| wsrep_flow_control_paused    | 0.000000                             |
| wsrep_flow_control_paused_ns | 0                                    |
| wsrep_flow_control_recv      | 0                                    |
| wsrep_flow_control_sent      | 0                                    |
| wsrep_gcomm_uuid             | 9ba3ac27-8e29-11e7-afdd-dfbbe16177bc |
| wsrep_incoming_addresses     | 192.168.1.249:3306                  |
| wsrep_last_committed         | 0                                    |
| wsrep_local_bf_aborts        | 0                                    |
| wsrep_local_cached_downto    | 18446744073709551615                 |
| wsrep_local_cert_failures    | 0                                    |
| wsrep_local_commits          | 0                                    |
| wsrep_local_index            | 0                                    |
| wsrep_local_recv_queue       | 0                                    |
| wsrep_local_recv_queue_avg   | 0.000000                             |
| wsrep_local_recv_queue_max   | 1                                    |
| wsrep_local_recv_queue_min   | 0                                    |
| wsrep_local_replays          | 0                                    |
| wsrep_local_send_queue       | 0                                    |
| wsrep_local_send_queue_avg   | 0.500000                             |
| wsrep_local_send_queue_max   | 2                                    |
| wsrep_local_send_queue_min   | 0                                    |
| wsrep_local_state            | 4                                    |
| wsrep_local_state_comment    | Synced                               |
| wsrep_local_state_uuid       | 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8 |
| wsrep_protocol_version       | 7                                    |
| wsrep_provider_name          | Galera                               |
| wsrep_provider_vendor        | Codership Oy <info@codership.com>    |
| wsrep_provider_version       | 25.3.20(r3703)                       |
| wsrep_ready                  | ON                                   |
| wsrep_received               | 82                                   |
| wsrep_received_bytes         | 7644                                 |
| wsrep_repl_data_bytes        | 0                                    |
| wsrep_repl_keys              | 0                                    |
| wsrep_repl_keys_bytes        | 0                                    |
| wsrep_repl_other_bytes       | 0                                    |
| wsrep_replicated             | 0                                    |
| wsrep_replicated_bytes       | 0                                    |
| wsrep_thread_count           | 5                                    |
+------------------------------+--------------------------------------+

VARIABLES:

+---------------------------------+--------------------------------------------------------------+
| Variable_name                   | Value                                                        |
+---------------------------------+--------------------------------------------------------------+
| wsrep_osu_method                | TOI                                                          |
| wsrep_auto_increment_control    | ON                                                           |
| wsrep_causal_reads              | OFF                                                          |
| wsrep_certify_nonpk             | ON                                                           |
| wsrep_cluster_address           | gcomm://                                                     |
| wsrep_cluster_name              | Galera                                                       |
| wsrep_convert_lock_to_trx       | OFF                                                          |
| wsrep_data_home_dir             | /data/mysql/clst                                             |
| wsrep_dbug_option               |                                                              |
| wsrep_debug                     | OFF                                                          |
| wsrep_desync                    | OFF                                                          |
| wsrep_dirty_reads               | OFF                                                          |
| wsrep_drupal_282555_workaround  | OFF                                                          |
| wsrep_forced_binlog_format      | NONE                                                         |
| wsrep_gtid_domain_id            | 192                                                          |
| wsrep_gtid_mode                 | ON                                                           |
| wsrep_load_data_splitting       | ON                                                           |
| wsrep_log_conflicts             | OFF                                                          |
| wsrep_max_ws_rows               | 0                                                            |
| wsrep_max_ws_size               | 2147483647                                                   |
| wsrep_mysql_replication_bundle  | 0                                                            |
| wsrep_node_address              | 192.168.1.249                                                |
| wsrep_node_incoming_address     | AUTO                                                         |
| wsrep_node_name                 | MBA2                                                         |
| wsrep_notify_cmd                |                                                              |
| wsrep_on                        | ON                                                           |
| wsrep_patch_version             | wsrep_25.19                                                  |
| wsrep_provider                  | /usr/lib64/galera/libgalera_smm.so                           |
| wsrep_provider_options          | base_dir = /data/mysql/clst; base_host = 192.168.1.249; ...  |
| wsrep_recover                   | OFF                                                          |
| wsrep_replicate_myisam          | OFF                                                          |
| wsrep_restart_slave             | OFF                                                          |
| wsrep_retry_autocommit          | 1                                                            |
| wsrep_slave_fk_checks           | ON                                                           |
| wsrep_slave_uk_checks           | OFF                                                          |
| wsrep_slave_threads             | 4                                                            |
| wsrep_sst_auth                  | ********                                                     |
| wsrep_sst_donor                 |                                                              |
| wsrep_sst_donor_rejects_queries | OFF                                                          |
| wsrep_sst_method                | mysqldump                                                    |
| wsrep_sst_receive_address       | AUTO                                                         |
| wsrep_start_position            | 00000000-0000-0000-0000-000000000000:-1                      |
| wsrep_sync_wait                 | 0                                                            |
+---------------------------------+--------------------------------------------------------------+

Comment by Andrii Nikitin (Inactive) [ 2017-08-31 ]

Just in case - could you also check on MBA2 :

ps auxw | grep mysqld

If only one service is listed there - please also send error log from MBA2

Comment by Michael Xu [ 2017-08-31 ]

PROCESS:

# ps auxw | grep mysqld
mysql    16368  0.1  7.3 9508896 584840 ?      Ssl  16:51   0:11 /usr/sbin/mysqld --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
root     19607  0.0  0.0 112644   968 pts/0    S+   18:30   0:00 grep --color=auto mysqld

ERROR LOG:

2017-08-31 17:09:00 139797813253888 [Note] WSREP: (9ba3ac27, 'tcp://0.0.0.0:4567') connection established to 07410f0a tcp://192.168.1.250:4567
2017-08-31 17:09:00 139797813253888 [Note] WSREP: (9ba3ac27, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-08-31 17:09:01 139797813253888 [Note] WSREP: declaring 07410f0a at tcp://192.168.1.250:4567 stable
2017-08-31 17:09:01 139797813253888 [Note] WSREP: Node 9ba3ac27 state prim
2017-08-31 17:09:01 139797813253888 [Note] WSREP: view(view_id(PRIM,07410f0a,32) memb {
        07410f0a,0
        9ba3ac27,0
} joined {
} left {
} partitioned {
})
2017-08-31 17:09:01 139797813253888 [Note] WSREP: save pc into disk
2017-08-31 17:09:01 139797804861184 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2017-08-31 17:09:01 139797804861184 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2017-08-31 17:09:01 139797804861184 [Note] WSREP: STATE EXCHANGE: sent state msg: 078d99d8-8e2c-11e7-92cc-97ad3b2774c3
2017-08-31 17:09:01 139797804861184 [Note] WSREP: STATE EXCHANGE: got state msg: 078d99d8-8e2c-11e7-92cc-97ad3b2774c3 from 0 (MBA1)
2017-08-31 17:09:01 139797804861184 [Note] WSREP: STATE EXCHANGE: got state msg: 078d99d8-8e2c-11e7-92cc-97ad3b2774c3 from 1 (MBA2)
2017-08-31 17:09:01 139797804861184 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 31,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-08-31 17:09:01 139797804861184 [Note] WSREP: Flow-control interval: [23, 23]
2017-08-31 17:09:01 139799349511936 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 32: Primary, number of nodes: 2, my index: 1, protocol version 3
2017-08-31 17:09:01 139799349511936 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-31 17:09:01 139799349511936 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-08-31 17:09:01 139799349511936 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-08-31 17:09:01 139797821646592 [Note] WSREP: Service thread queue flushed.
2017-08-31 17:09:03 139797804861184 [Warning] WSREP: Member 0.0 (MBA1) requested state transfer from '192.168.1.249', but it is impossible to select State Transfer donor: No route to host
2017-08-31 17:09:04 139797813253888 [Note] WSREP: (9ba3ac27, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-31 17:09:04 139797813253888 [Note] WSREP: forgetting 07410f0a (tcp://192.168.1.250:4567)
2017-08-31 17:09:04 139797813253888 [Note] WSREP: Node 9ba3ac27 state prim
2017-08-31 17:09:04 139797813253888 [Note] WSREP: view(view_id(PRIM,9ba3ac27,33) memb {
        9ba3ac27,0
} joined {
} left {
} partitioned {
        07410f0a,0
})
2017-08-31 17:09:04 139797813253888 [Note] WSREP: save pc into disk
2017-08-31 17:09:04 139797813253888 [Note] WSREP: forgetting 07410f0a (tcp://192.168.1.250:4567)
2017-08-31 17:09:04 139797804861184 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2017-08-31 17:09:04 139797804861184 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 0957f9a2-8e2c-11e7-8095-57a70d24513c
2017-08-31 17:09:04 139797804861184 [Note] WSREP: STATE EXCHANGE: sent state msg: 0957f9a2-8e2c-11e7-8095-57a70d24513c
2017-08-31 17:09:04 139797804861184 [Note] WSREP: STATE EXCHANGE: got state msg: 0957f9a2-8e2c-11e7-8095-57a70d24513c from 0 (MBA2)
2017-08-31 17:09:04 139797804861184 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 32,
        members    = 1/1 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-08-31 17:09:04 139797804861184 [Note] WSREP: Flow-control interval: [16, 16]
2017-08-31 17:09:04 139799349511936 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 33: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-08-31 17:09:04 139799349511936 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-31 17:09:04 139799349511936 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-08-31 17:09:04 139799349511936 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-08-31 17:09:04 139797821646592 [Note] WSREP: Service thread queue flushed.
2017-08-31 17:09:08 139797813253888 [Note] WSREP: (9ba3ac27, 'tcp://0.0.0.0:4567') connection established to 07410f0a tcp://192.168.1.250:4567
2017-08-31 17:09:08 139797813253888 [Warning] WSREP: discarding established (time wait) 07410f0a (tcp://192.168.1.250:4567) 
2017-08-31 17:09:09 139797813253888 [Note] WSREP:  cleaning up 07410f0a (tcp://192.168.1.250:4567)
2017-08-31 17:09:43 139797813253888 [Note] WSREP: (9ba3ac27, 'tcp://0.0.0.0:4567') connection established to 20b5c7a8 tcp://192.168.1.250:4567
2017-08-31 17:09:43 139797813253888 [Note] WSREP: (9ba3ac27, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-08-31 17:09:44 139797813253888 [Note] WSREP: declaring 20b5c7a8 at tcp://192.168.1.250:4567 stable
2017-08-31 17:09:44 139797813253888 [Note] WSREP: Node 9ba3ac27 state prim
2017-08-31 17:09:44 139797813253888 [Note] WSREP: view(view_id(PRIM,20b5c7a8,34) memb {
        20b5c7a8,0
        9ba3ac27,0
} joined {
} left {
} partitioned {
})
2017-08-31 17:09:44 139797813253888 [Note] WSREP: save pc into disk
2017-08-31 17:09:44 139797804861184 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2017-08-31 17:09:44 139797804861184 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2017-08-31 17:09:44 139797804861184 [Note] WSREP: STATE EXCHANGE: sent state msg: 214e9bd5-8e2c-11e7-98dd-b70e9a67107b
2017-08-31 17:09:44 139797804861184 [Note] WSREP: STATE EXCHANGE: got state msg: 214e9bd5-8e2c-11e7-98dd-b70e9a67107b from 0 (MBA1)
2017-08-31 17:09:44 139797804861184 [Note] WSREP: STATE EXCHANGE: got state msg: 214e9bd5-8e2c-11e7-98dd-b70e9a67107b from 1 (MBA2)
2017-08-31 17:09:44 139797804861184 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 33,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-08-31 17:09:44 139797804861184 [Note] WSREP: Flow-control interval: [23, 23]
2017-08-31 17:09:44 139799349511936 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 34: Primary, number of nodes: 2, my index: 1, protocol version 3
2017-08-31 17:09:44 139799349511936 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-31 17:09:44 139799349511936 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-08-31 17:09:44 139799349511936 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-08-31 17:09:44 139797821646592 [Note] WSREP: Service thread queue flushed.
2017-08-31 17:09:46 139797804861184 [Warning] WSREP: Member 0.0 (MBA1) requested state transfer from '192.168.1.249', but it is impossible to select State Transfer donor: No route to host
2017-08-31 17:09:47 139797813253888 [Note] WSREP: (9ba3ac27, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-31 17:09:47 139797813253888 [Note] WSREP: forgetting 20b5c7a8 (tcp://192.168.1.250:4567)
2017-08-31 17:09:47 139797813253888 [Note] WSREP: Node 9ba3ac27 state prim
2017-08-31 17:09:47 139797813253888 [Note] WSREP: view(view_id(PRIM,9ba3ac27,35) memb {
        9ba3ac27,0
} joined {
} left {
} partitioned {
        20b5c7a8,0
})
2017-08-31 17:09:47 139797813253888 [Note] WSREP: save pc into disk
2017-08-31 17:09:47 139797813253888 [Note] WSREP: forgetting 20b5c7a8 (tcp://192.168.1.250:4567)
2017-08-31 17:09:47 139797804861184 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2017-08-31 17:09:47 139797804861184 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 2318fec7-8e2c-11e7-9c0f-e62cb0f756f1
2017-08-31 17:09:47 139797804861184 [Note] WSREP: STATE EXCHANGE: sent state msg: 2318fec7-8e2c-11e7-9c0f-e62cb0f756f1
2017-08-31 17:09:47 139797804861184 [Note] WSREP: STATE EXCHANGE: got state msg: 2318fec7-8e2c-11e7-9c0f-e62cb0f756f1 from 0 (MBA2)
2017-08-31 17:09:47 139797804861184 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 34,
        members    = 1/1 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-08-31 17:09:47 139797804861184 [Note] WSREP: Flow-control interval: [16, 16]
2017-08-31 17:09:47 139799349511936 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 35: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-08-31 17:09:47 139799349511936 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-31 17:09:47 139799349511936 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-08-31 17:09:47 139799349511936 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-08-31 17:09:47 139797821646592 [Note] WSREP: Service thread queue flushed.
2017-08-31 17:09:51 139797813253888 [Note] WSREP: (9ba3ac27, 'tcp://0.0.0.0:4567') connection established to 20b5c7a8 tcp://192.168.1.250:4567
2017-08-31 17:09:51 139797813253888 [Warning] WSREP: discarding established (time wait) 20b5c7a8 (tcp://192.168.1.250:4567) 
2017-08-31 17:09:53 139797813253888 [Note] WSREP:  cleaning up 20b5c7a8 (tcp://192.168.1.250:4567)

Comment by Andrii Nikitin (Inactive) [ 2017-08-31 ]

OK last quick idea is to try ping from 192.168.1.249: 192.168.1.250 and itself 192.168.1.249
Additionally (optionally) you can try to put `set -x` at start of sst scripts on both nodes and retry operation and send Error logs again as they will have detailed sst script messages.
Otherwise I should be able to try the same environment nearest days and see if problem occurs there.

Comment by Michael Xu [ 2017-09-01 ]

ping from 249:

$ ping 192.168.52.250
PING 192.168.52.250 (192.168.52.250) 56(84) bytes of data.
64 bytes from 192.168.52.250: icmp_seq=1 ttl=64 time=0.295 ms
64 bytes from 192.168.52.250: icmp_seq=2 ttl=64 time=0.268 ms
64 bytes from 192.168.52.250: icmp_seq=3 ttl=64 time=0.257 ms
^C
 
$ ping 192.168.52.249
PING 192.168.52.249 (192.168.52.249) 56(84) bytes of data.
64 bytes from 192.168.52.249: icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from 192.168.52.249: icmp_seq=2 ttl=64 time=0.038 ms
64 bytes from 192.168.52.249: icmp_seq=3 ttl=64 time=0.040 ms
^C

I don't see any further information after adding 'set -x' at the top of sst scripts.

on joiner side:

2017-09-01 10:40:25 140550470408320 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-09-01 10:40:25 140550470408320 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2017-09-01 10:40:25 140550257452800 [Note] Event Scheduler: scheduler thread started with id 7
2017-09-01 10:40:25 140550470408320 [Note] WSREP: wsrep_load(): Galera 25.3.20(r3703) by Codership Oy <info@codership.com> loaded successfully.
2017-09-01 10:40:25 140550470408320 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-09-01 10:40:25 140550470408320 [Warning] WSREP: Could not open state file for reading: '/data/mysql/clst/grastate.dat'
2017-09-01 10:40:25 140550470408320 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 1
2017-09-01 10:40:25 140550470408320 [Note] WSREP: Passing config to GCS: base_dir = /data/mysql/clst; base_host = 192.168.1.250; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/mysql/clst; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql/clst/galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 2048M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false
2017-09-01 10:40:25 140550470408320 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(00000000-0000-0000-0000-000000000000:-1)
2017-09-01 10:40:25 140550470408320 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-09-01 10:40:25 140550470408320 [Note] WSREP: Start replication
2017-09-01 10:40:25 140550470408320 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-09-01 10:40:25 140550470408320 [Note] WSREP: protonet asio version 0
2017-09-01 10:40:25 140550470408320 [Note] WSREP: Using CRC-32C for message checksums.
2017-09-01 10:40:25 140550470408320 [Note] WSREP: backend: asio
2017-09-01 10:40:25 140550470408320 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2017-09-01 10:40:25 140550470408320 [Warning] WSREP: access file(/data/mysql/clst/gvwstate.dat) failed(No such file or directory)
2017-09-01 10:40:25 140550470408320 [Note] WSREP: restore pc from disk failed
2017-09-01 10:40:25 140550470408320 [Note] WSREP: GMCast version 0
2017-09-01 10:40:25 140550470408320 [Note] WSREP: (e8c757e2, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-09-01 10:40:25 140550470408320 [Note] WSREP: (e8c757e2, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-09-01 10:40:25 140550470408320 [Note] WSREP: EVS version 0
2017-09-01 10:40:25 140550470408320 [Note] WSREP: gcomm: connecting to group 'Galera', peer '192.168.1.249:'
2017-09-01 10:40:25 140550470408320 [Note] WSREP: (e8c757e2, 'tcp://0.0.0.0:4567') connection established to 7ada09f8 tcp://192.168.1.249:4567
2017-09-01 10:40:25 140550470408320 [Note] WSREP: (e8c757e2, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-09-01 10:40:26 140550470408320 [Note] WSREP: declaring 7ada09f8 at tcp://192.168.1.249:4567 stable
2017-09-01 10:40:26 140550470408320 [Note] WSREP: Node 7ada09f8 state prim
2017-09-01 10:40:26 140550470408320 [Note] WSREP: view(view_id(PRIM,7ada09f8,2) memb {
        7ada09f8,0
        e8c757e2,0
} joined {
} left {
} partitioned {
})
2017-09-01 10:40:26 140550470408320 [Note] WSREP: save pc into disk
2017-09-01 10:40:26 140550470408320 [Note] WSREP: gcomm: connected
2017-09-01 10:40:26 140550470408320 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-09-01 10:40:26 140550470408320 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-09-01 10:40:26 140550470408320 [Note] WSREP: Opened channel 'Galera'
2017-09-01 10:40:26 140541463357184 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2017-09-01 10:40:26 140541463357184 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2017-09-01 10:40:26 140541463357184 [Note] WSREP: STATE EXCHANGE: sent state msg: e9143e60-8ebe-11e7-995f-0722e5338c01
2017-09-01 10:40:26 140541463357184 [Note] WSREP: STATE EXCHANGE: got state msg: e9143e60-8ebe-11e7-995f-0722e5338c01 from 0 (MBA2)
2017-09-01 10:40:26 140541463357184 [Note] WSREP: STATE EXCHANGE: got state msg: e9143e60-8ebe-11e7-995f-0722e5338c01 from 1 (MBA1)
2017-09-01 10:40:26 140541463357184 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 1,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-09-01 10:40:26 140541463357184 [Note] WSREP: Flow-control interval: [23, 23]
2017-09-01 10:40:26 140541463357184 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
2017-09-01 10:40:26 140550470408320 [Note] Reading of all Master_info entries succeded
2017-09-01 10:40:26 140550470408320 [Note] Added new Master_info '' to hash table
2017-09-01 10:40:26 140550470408320 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.2.8-MariaDB-log'  socket: '/data/mysql/vars/mysql.sock'  port: 3306  MariaDB Server
2017-09-01 10:40:26 140550256776960 [Note] WSREP: State transfer required: 
        Group state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0
        Local state: 00000000-0000-0000-0000-000000000000:-1
2017-09-01 10:40:26 140550256776960 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
2017-09-01 10:40:26 140550256776960 [Warning] WSREP: Gap in state sequence. Need state transfer.
2017-09-01 10:40:28 140550256776960 [Note] WSREP: Prepared SST request: mysqldump|192.168.1.250:3306
2017-09-01 10:40:28 140550256776960 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-09-01 10:40:28 140550256776960 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-09-01 10:40:28 140550256776960 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-09-01 10:40:28 140544374187776 [Note] WSREP: Service thread queue flushed.
2017-09-01 10:40:28 140550256776960 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (9ba441c2-8e29-11e7-b7e0-2f564ca18fa8): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2017-09-01 10:40:28 140541463357184 [Warning] WSREP: Member 1.0 (MBA1) requested state transfer from '192.168.1.249', but it is impossible to select State Transfer donor: No route to host
2017-09-01 10:40:28 140550256776960 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2017-09-01 10:40:28 140550256776960 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2017-09-01 10:40:28 140550256776960 [Note] WSREP: Closing send monitor...
2017-09-01 10:40:28 140550256776960 [Note] WSREP: Closed send monitor.
2017-09-01 10:40:28 140550256776960 [Note] WSREP: gcomm: terminating thread
2017-09-01 10:40:28 140550256776960 [Note] WSREP: gcomm: joining thread
2017-09-01 10:40:28 140550256776960 [Note] WSREP: gcomm: closing backend
2017-09-01 10:40:29 140550256776960 [Note] WSREP: (e8c757e2, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-09-01 10:40:32 140550256776960 [Note] WSREP: (e8c757e2, 'tcp://0.0.0.0:4567') connection to peer 7ada09f8 with addr tcp://192.168.1.249:4567 timed out, no messages seen in PT3S
2017-09-01 10:40:32 140550256776960 [Note] WSREP: (e8c757e2, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.249:4567 
2017-09-01 10:40:33 140550256776960 [Note] WSREP: (e8c757e2, 'tcp://0.0.0.0:4567') reconnecting to 7ada09f8 (tcp://192.168.1.249:4567), attempt 0
2017-09-01 10:40:34 140550256776960 [Note] WSREP: evs::proto(e8c757e2, LEAVING, view_id(REG,7ada09f8,2)) suspecting node: 7ada09f8
2017-09-01 10:40:34 140550256776960 [Note] WSREP: evs::proto(e8c757e2, LEAVING, view_id(REG,7ada09f8,2)) suspected node without join message, declaring inactive
2017-09-01 10:40:34 140550256776960 [Note] WSREP: view(view_id(NON_PRIM,7ada09f8,2) memb {
        e8c757e2,0
} joined {
} left {
} partitioned {
        7ada09f8,0
})
2017-09-01 10:40:34 140550256776960 [Note] WSREP: view((empty))
2017-09-01 10:40:34 140550256776960 [Note] WSREP: gcomm: closed
2017-09-01 10:40:34 140541463357184 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-09-01 10:40:34 140541463357184 [Note] WSREP: Flow-control interval: [16, 16]
2017-09-01 10:40:34 140541463357184 [Note] WSREP: Received NON-PRIMARY.
2017-09-01 10:40:34 140541463357184 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 0)
2017-09-01 10:40:34 140541463357184 [Note] WSREP: Received self-leave message.
2017-09-01 10:40:34 140541463357184 [Note] WSREP: Flow-control interval: [0, 0]
2017-09-01 10:40:34 140541463357184 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2017-09-01 10:40:34 140541463357184 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 0)
2017-09-01 10:40:34 140541463357184 [Note] WSREP: RECV thread exiting 0: Success
2017-09-01 10:40:34 140550256776960 [Note] WSREP: recv_thread() joined.
2017-09-01 10:40:34 140550256776960 [Note] WSREP: Closing replication queue.
2017-09-01 10:40:34 140550256776960 [Note] WSREP: Closing slave action queue.
2017-09-01 10:40:34 140550256776960 [Note] WSREP: /usr/sbin/mysqld: Terminated.

on donor side:

2017-09-01 10:40:25 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') connection established to e8c757e2 tcp://192.168.1.250:4567
2017-09-01 10:40:25 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-09-01 10:40:26 140348458268416 [Note] WSREP: declaring e8c757e2 at tcp://192.168.1.250:4567 stable
2017-09-01 10:40:26 140348458268416 [Note] WSREP: Node 7ada09f8 state prim
2017-09-01 10:40:26 140348458268416 [Note] WSREP: view(view_id(PRIM,7ada09f8,2) memb {
        7ada09f8,0
        e8c757e2,0
} joined {
} left {
} partitioned {
})
2017-09-01 10:40:26 140348458268416 [Note] WSREP: save pc into disk
2017-09-01 10:40:26 140348449875712 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2017-09-01 10:40:26 140348449875712 [Note] WSREP: STATE_EXCHANGE: sent state UUID: e9143e60-8ebe-11e7-995f-0722e5338c01
2017-09-01 10:40:26 140348449875712 [Note] WSREP: STATE EXCHANGE: sent state msg: e9143e60-8ebe-11e7-995f-0722e5338c01
2017-09-01 10:40:26 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: e9143e60-8ebe-11e7-995f-0722e5338c01 from 0 (MBA2)
2017-09-01 10:40:26 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: e9143e60-8ebe-11e7-995f-0722e5338c01 from 1 (MBA1)
2017-09-01 10:40:26 140348449875712 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 1,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-09-01 10:40:26 140348449875712 [Note] WSREP: Flow-control interval: [23, 23]
2017-09-01 10:40:26 140354767853312 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 3
2017-09-01 10:40:26 140354767853312 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-09-01 10:40:26 140354767853312 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-09-01 10:40:26 140354767853312 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-09-01 10:40:26 140349079000832 [Note] WSREP: Service thread queue flushed.
2017-09-01 10:40:28 140348449875712 [Warning] WSREP: Member 1.0 (MBA1) requested state transfer from '192.168.1.249', but it is impossible to select State Transfer donor: No route to host
2017-09-01 10:40:28 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-09-01 10:40:29 140348458268416 [Note] WSREP: forgetting e8c757e2 (tcp://192.168.1.250:4567)
2017-09-01 10:40:29 140348458268416 [Note] WSREP: Node 7ada09f8 state prim
2017-09-01 10:40:29 140348458268416 [Note] WSREP: view(view_id(PRIM,7ada09f8,3) memb {
        7ada09f8,0
} joined {
} left {
} partitioned {
        e8c757e2,0
})
2017-09-01 10:40:29 140348458268416 [Note] WSREP: save pc into disk
2017-09-01 10:40:29 140348458268416 [Note] WSREP: forgetting e8c757e2 (tcp://192.168.1.250:4567)
2017-09-01 10:40:29 140348449875712 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2017-09-01 10:40:29 140348449875712 [Note] WSREP: STATE_EXCHANGE: sent state UUID: eb2a6bcc-8ebe-11e7-8c0c-6707477a3222
2017-09-01 10:40:29 140348449875712 [Note] WSREP: STATE EXCHANGE: sent state msg: eb2a6bcc-8ebe-11e7-8c0c-6707477a3222
2017-09-01 10:40:29 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: eb2a6bcc-8ebe-11e7-8c0c-6707477a3222 from 0 (MBA2)
2017-09-01 10:40:29 140348449875712 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 2,
        members    = 1/1 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-09-01 10:40:29 140348449875712 [Note] WSREP: Flow-control interval: [16, 16]
2017-09-01 10:40:29 140354767853312 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-09-01 10:40:29 140354767853312 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-09-01 10:40:29 140354767853312 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-09-01 10:40:29 140354767853312 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-09-01 10:40:29 140349079000832 [Note] WSREP: Service thread queue flushed.
2017-09-01 10:40:33 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') connection established to e8c757e2 tcp://192.168.1.250:4567
2017-09-01 10:40:33 140348458268416 [Warning] WSREP: discarding established (time wait) e8c757e2 (tcp://192.168.1.250:4567) 
2017-09-01 10:40:34 140348458268416 [Note] WSREP:  cleaning up e8c757e2 (tcp://192.168.1.250:4567)
2017-09-01 10:41:08 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') connection established to 026f82d0 tcp://192.168.1.250:4567
2017-09-01 10:41:08 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-09-01 10:41:09 140348458268416 [Note] WSREP: declaring 026f82d0 at tcp://192.168.1.250:4567 stable
2017-09-01 10:41:09 140348458268416 [Note] WSREP: Node 7ada09f8 state prim
2017-09-01 10:41:09 140348458268416 [Note] WSREP: view(view_id(PRIM,026f82d0,4) memb {
        026f82d0,0
        7ada09f8,0
} joined {
} left {
} partitioned {
})
2017-09-01 10:41:09 140348458268416 [Note] WSREP: save pc into disk
2017-09-01 10:41:09 140348449875712 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2017-09-01 10:41:09 140348449875712 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2017-09-01 10:41:09 140348449875712 [Note] WSREP: STATE EXCHANGE: sent state msg: 030859f8-8ebf-11e7-bd99-329c325d6078
2017-09-01 10:41:09 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: 030859f8-8ebf-11e7-bd99-329c325d6078 from 0 (MBA1)
2017-09-01 10:41:09 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: 030859f8-8ebf-11e7-bd99-329c325d6078 from 1 (MBA2)
2017-09-01 10:41:09 140348449875712 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 3,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-09-01 10:41:09 140348449875712 [Note] WSREP: Flow-control interval: [23, 23]
2017-09-01 10:41:09 140354767853312 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 4: Primary, number of nodes: 2, my index: 1, protocol version 3
2017-09-01 10:41:09 140354767853312 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-09-01 10:41:09 140354767853312 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-09-01 10:41:09 140354767853312 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-09-01 10:41:09 140349079000832 [Note] WSREP: Service thread queue flushed.
2017-09-01 10:41:11 140348449875712 [Warning] WSREP: Member 0.0 (MBA1) requested state transfer from '192.168.1.249', but it is impossible to select State Transfer donor: No route to host
2017-09-01 10:41:11 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-09-01 10:41:12 140348458268416 [Note] WSREP: forgetting 026f82d0 (tcp://192.168.1.250:4567)
2017-09-01 10:41:12 140348458268416 [Note] WSREP: Node 7ada09f8 state prim
2017-09-01 10:41:12 140348458268416 [Note] WSREP: view(view_id(PRIM,7ada09f8,5) memb {
        7ada09f8,0
} joined {
} left {
} partitioned {
        026f82d0,0
})
2017-09-01 10:41:12 140348458268416 [Note] WSREP: save pc into disk
2017-09-01 10:41:12 140348458268416 [Note] WSREP: forgetting 026f82d0 (tcp://192.168.1.250:4567)
2017-09-01 10:41:12 140348449875712 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2017-09-01 10:41:12 140348449875712 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 04d2bdbe-8ebf-11e7-b042-56b5bf6029fc
2017-09-01 10:41:12 140348449875712 [Note] WSREP: STATE EXCHANGE: sent state msg: 04d2bdbe-8ebf-11e7-b042-56b5bf6029fc
2017-09-01 10:41:12 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: 04d2bdbe-8ebf-11e7-b042-56b5bf6029fc from 0 (MBA2)
2017-09-01 10:41:12 140348449875712 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 4,
        members    = 1/1 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-09-01 10:41:12 140348449875712 [Note] WSREP: Flow-control interval: [16, 16]
2017-09-01 10:41:12 140354767853312 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 5: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-09-01 10:41:12 140354767853312 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-09-01 10:41:12 140354767853312 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-09-01 10:41:12 140354767853312 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-09-01 10:41:12 140349079000832 [Note] WSREP: Service thread queue flushed.
2017-09-01 10:41:16 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') connection established to 026f82d0 tcp://192.168.1.250:4567
2017-09-01 10:41:16 140348458268416 [Warning] WSREP: discarding established (time wait) 026f82d0 (tcp://192.168.1.250:4567) 
2017-09-01 10:41:17 140348458268416 [Note] WSREP:  cleaning up 026f82d0 (tcp://192.168.1.250:4567)
2017-09-01 10:41:52 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') connection established to 1c2320d5 tcp://192.168.1.250:4567
2017-09-01 10:41:52 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2017-09-01 10:41:52 140348458268416 [Note] WSREP: declaring 1c2320d5 at tcp://192.168.1.250:4567 stable
2017-09-01 10:41:52 140348458268416 [Note] WSREP: Node 7ada09f8 state prim
2017-09-01 10:41:52 140348458268416 [Note] WSREP: view(view_id(PRIM,1c2320d5,6) memb {
        1c2320d5,0
        7ada09f8,0
} joined {
} left {
} partitioned {
})
2017-09-01 10:41:52 140348458268416 [Note] WSREP: save pc into disk
2017-09-01 10:41:52 140348449875712 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2017-09-01 10:41:52 140348449875712 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2017-09-01 10:41:53 140348449875712 [Note] WSREP: STATE EXCHANGE: sent state msg: 1cbbf696-8ebf-11e7-88fc-1f95963fa3e2
2017-09-01 10:41:53 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: 1cbbf696-8ebf-11e7-88fc-1f95963fa3e2 from 0 (MBA1)
2017-09-01 10:41:53 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: 1cbbf696-8ebf-11e7-88fc-1f95963fa3e2 from 1 (MBA2)
2017-09-01 10:41:53 140348449875712 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 5,
        members    = 1/2 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-09-01 10:41:53 140348449875712 [Note] WSREP: Flow-control interval: [23, 23]
2017-09-01 10:41:53 140354767853312 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 6: Primary, number of nodes: 2, my index: 1, protocol version 3
2017-09-01 10:41:53 140354767853312 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-09-01 10:41:53 140354767853312 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-09-01 10:41:53 140354767853312 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-09-01 10:41:53 140349079000832 [Note] WSREP: Service thread queue flushed.
2017-09-01 10:41:55 140348449875712 [Warning] WSREP: Member 0.0 (MBA1) requested state transfer from '192.168.1.249', but it is impossible to select State Transfer donor: No route to host
2017-09-01 10:41:55 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-09-01 10:41:56 140348458268416 [Note] WSREP: forgetting 1c2320d5 (tcp://192.168.1.250:4567)
2017-09-01 10:41:56 140348458268416 [Note] WSREP: Node 7ada09f8 state prim
2017-09-01 10:41:56 140348458268416 [Note] WSREP: view(view_id(PRIM,7ada09f8,7) memb {
        7ada09f8,0
} joined {
} left {
} partitioned {
        1c2320d5,0
})
2017-09-01 10:41:56 140348458268416 [Note] WSREP: save pc into disk
2017-09-01 10:41:56 140348458268416 [Note] WSREP: forgetting 1c2320d5 (tcp://192.168.1.250:4567)
2017-09-01 10:41:56 140348449875712 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2017-09-01 10:41:56 140348449875712 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 1e865fa2-8ebf-11e7-adea-6e3f17bcd7da
2017-09-01 10:41:56 140348449875712 [Note] WSREP: STATE EXCHANGE: sent state msg: 1e865fa2-8ebf-11e7-adea-6e3f17bcd7da
2017-09-01 10:41:56 140348449875712 [Note] WSREP: STATE EXCHANGE: got state msg: 1e865fa2-8ebf-11e7-adea-6e3f17bcd7da from 0 (MBA2)
2017-09-01 10:41:56 140348449875712 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 6,
        members    = 1/1 (joined/total),
        act_id     = 0,
        last_appl. = 0,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8
2017-09-01 10:41:56 140348449875712 [Note] WSREP: Flow-control interval: [16, 16]
2017-09-01 10:41:56 140354767853312 [Note] WSREP: New cluster view: global state: 9ba441c2-8e29-11e7-b7e0-2f564ca18fa8:0, view# 7: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-09-01 10:41:56 140354767853312 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-09-01 10:41:56 140354767853312 [Note] WSREP: REPL Protocols: 7 (3, 2)
2017-09-01 10:41:56 140354767853312 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2017-09-01 10:41:56 140349079000832 [Note] WSREP: Service thread queue flushed.
2017-09-01 10:42:00 140348458268416 [Note] WSREP: (7ada09f8, 'tcp://0.0.0.0:4567') connection established to 1c2320d5 tcp://192.168.1.250:4567
2017-09-01 10:42:00 140348458268416 [Warning] WSREP: discarding established (time wait) 1c2320d5 (tcp://192.168.1.250:4567) 
2017-09-01 10:42:01 140348458268416 [Note] WSREP:  cleaning up 1c2320d5 (tcp://192.168.1.250:4567)

And I built another two new servers using 10.1.26, everything looks fine.

Comment by Andrii Nikitin (Inactive) [ 2017-09-06 ]

I confirm that I see the same behavior with similar configuration:

Sep 06 10:18:53 c7a mysqld[2200]: 2017-09-06 10:18:53 140405433689856 [Warning] WSREP: Member 0.0 (MBA0) requested state transfer from '192.168.1.251', but it is impossible to select State Transfer donor: No route to host
Sep 06 10:18:53 c7a mysqld[2200]: 2017-09-06 10:18:53 140407652124416 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
Sep 06 10:18:53 c7a mysqld[2200]: 2017-09-06 10:18:53 140407652124416 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary compon

But then I commented line with 'wsrep_sst_donor' out of cnf file and the node was able to connect.
Then I uncommented that line back and node is still able to re-join, unless I remove galera.cache and grastate.dat . If I remove those files and attemt to start with uncommented 'wsrep_sst_donor' - I see the errors back

Sep 06 10:42:51 c7a mysqld[21434]: 2017-09-06 10:42:51 139753489438464 [Warning] WSREP: Member 0.0 (MBA0) requested state transfer from '192.168.1.251', but it is impossible to select State Transfer donor: No route to host
Sep 06 10:42:51 c7a mysqld[21434]: 2017-09-06 10:42:51 139755753666304 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
Sep 06 10:42:51 c7a mysqld[21434]: 2017-09-06 10:42:51 139755753666304 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary compo

mxu Could you confirm that you are able to set up cluster after removing parameter wsrep_sst_donor out of cnf file?

Comment by Andrii Nikitin (Inactive) [ 2017-09-06 ]

mxu as per documentation - wsrep_sst_donor should contain 'node name', not 'node ip adddress' as in provided configuration file.
And indeed - when I change wsrep_sst_donor accordingly (in my case "wsrep_sst_donor = MBA1" ) - the node is able to join cluster without any problem.
so I am closing this as 'not a bug'

Generated at Thu Feb 08 08:07:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.