[MDEV-9554] Rsync SST donor stuck in DONOR/DESYNCED state when joiner crashes Created: 2016-02-13  Updated: 2021-08-06  Resolved: 2019-09-19

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST, Scripts & Clients, wsrep
Affects Version/s: 10.1.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 2
Labels: galera

Attachments: File patch.diff    
Issue Links:
Relates
relates to MDEV-15442 xtrabackup-v2/mariabackup SST donor s... Closed
relates to MDEV-10021 Joiner node crashes badly after succe... Closed

 Description   

A Galera Cluster node crashed while receiving an rsync SST. After that, the donor node became stuck in the DONOR/DESYNCED status, even when the joiner returned to the cluster.

First, node2 joined the cluster at 17:32 and requested an SST from node1.

2016-02-11 17:32:10 140284524488448 [Note] WSREP: Quorum results:
	version    = 3,
	component  = PRIMARY,
	conf_id    = 140,
	members    = 4/5 (joined/total),
	act_id     = 107996827,
	last_appl. = 107996697,
	protocols  = 0/7/3 (gcs/repl/appl),
	group UUID = b36c4dd2-7dbc-11e5-857f-c70794853074
2016-02-11 17:32:10 140284524488448 [Note] WSREP: Flow-control interval: [36, 36]
2016-02-11 17:32:10 140391984752384 [Note] WSREP: New cluster view: global state: b36c4dd2-7dbc-11e5-857f-c70794853074:107996827, view# 141: Primary, number of nodes: 5, my index: 3, protocol version 3
2016-02-11 17:32:10 140284524488448 [Note] WSREP: Member 1.0 (node2) requested state transfer from 'node1'. Selected 3.0 (node1)(SYNCED) as donor.

Then node1 agreed to be donor for the SST:

2016-02-11 17:32:10 140284524488448 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 107996829)
2016-02-11 17:32:10 140391984752384 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-02-11 17:32:10 140284591601408 [Note] WSREP: Service thread queue flushed.
2016-02-11 17:32:10 140391984752384 [Note] WSREP: Assign initial position for certification: 107996827, protocol version: 3
2016-02-11 17:32:10 140284591601408 [Note] WSREP: Service thread queue flushed.
2016-02-11 17:32:11 139530053068544 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '10.191.33.152:4444/rsync_sst' --socket '/tmp/mysql.sock' --datadir '/data/mariadb/'     '' --gtid 'b36c4dd2-7dbc-11e5-857f-c70794853074:107996829' --gtid-domain-id '0''
2016-02-11 17:32:11 140391983253248 [Note] WSREP: sst_donor_thread signaled with 0
2016-02-11 17:32:11 139530053068544 [Note] WSREP: Flushing tables for SST...
2016-02-11 17:32:11 139530053068544 [Note] WSREP: Provider paused at b36c4dd2-7dbc-11e5-857f-c70794853074:107996846 (137640)
2016-02-11 17:32:11 139530053068544 [Note] WSREP: Tables flushed.

But before the SST could complete, node2 died:

2016-02-11 17:32:13 140284537075456 [Note] WSREP: (868cb19d, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-02-11 17:40:03 140284537075456 [Note] WSREP: (868cb19d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.191.33.152:4567 
2016-02-11 17:40:03 140284537075456 [Note] WSREP: declaring 00ec119f at tcp://10.191.33.150:4567 stable
2016-02-11 17:40:03 140284537075456 [Note] WSREP: declaring 56787688 at tcp://10.191.47.207:4567 stable
2016-02-11 17:40:03 140284537075456 [Note] WSREP: declaring a6816777 at tcp://10.191.44.142:4567 stable
2016-02-11 17:40:03 140284537075456 [Note] WSREP: forgetting 49fb877e (tcp://10.191.33.152:4567)
2016-02-11 17:40:03 140284537075456 [Note] WSREP: (868cb19d, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-02-11 17:40:04 140284537075456 [Note] WSREP: Node 00ec119f state prim
2016-02-11 17:40:04 140284537075456 [Note] WSREP: view(view_id(PRIM,00ec119f,1688) memb {
	00ec119f,0
	56787688,0
	868cb19d,0
	a6816777,0
} joined {
} left {
} partitioned {
	49fb877e,0
})
2016-02-11 17:40:04 140284537075456 [Note] WSREP: save pc into disk
2016-02-11 17:40:04 140284537075456 [Note] WSREP: forgetting 49fb877e (tcp://10.191.33.152:4567)

But then node2 came back a little while later:

2016-02-11 18:18:57 140284524488448 [Note] WSREP: Quorum results:
	version    = 3,
	component  = PRIMARY,
	conf_id    = 142,
	members    = 4/5 (joined/total),
	act_id     = 108088703,
	last_appl. = 107996828,
	protocols  = 0/7/3 (gcs/repl/appl),
	group UUID = b36c4dd2-7dbc-11e5-857f-c70794853074
2016-02-11 18:18:57 140284524488448 [Note] WSREP: Flow-control interval: [36, 36]
2016-02-11 18:18:57 140284524488448 [Note] WSREP: Member 4.0 (node2) requested state transfer from 'node1'. Selected 0.0 (node3)(SYNCED) as donor.

As you can see from the above, node2 asked node1 to be donor again. However, node1 already thought it was a donor, so node2 instead chose node3 to be donor.

Very soon after, the SST was complete:

2016-02-11 18:18:57 140284524488448 [Note] WSREP: 0.0 (node3): State transfer to 4.0 (node2) complete.
2016-02-11 18:18:57 140284524488448 [Note] WSREP: Member 0.0 (node3) synced with group.
2016-02-11 18:19:00 140284537075456 [Note] WSREP: (868cb19d, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-02-11 18:19:51 140284524488448 [Note] WSREP: 4.0 (node2): State transfer from 0.0 (node3) complete.
2016-02-11 18:19:51 140284524488448 [Note] WSREP: Member 4.0 (node2) synced with group.

On node1, there is also a dead wsrep_sst_rsync process that is still running:

34399 ? S 5:42 /bin/bash -ue /opt/mariadb-10.1.11//bin/wsrep_sst_rsync --role donor --address 10.191.33.152:4444/rsync_sst --socket /tmp/mysql.sock --datadir /data/mariadb/ --gtid b36c4dd2-7dbc-11e5-857f-c70794853074:107996829 --gtid-domain-id 0

The joiner actually shows the SST completing at 17:39:

2016-02-11 17:32:10 140223497369344 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.191.33.152' --datadir '/data/mariadb/'   --parent '60031'  '' '
2016-02-11 17:32:10 140332152313600 [Note] WSREP: Prepared SST request: rsync|10.191.33.152:4444/rsync_sst
2016-02-11 17:32:10 140332152313600 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-02-11 17:32:10 140223589644032 [Note] WSREP: Service thread queue flushed.
2016-02-11 17:32:10 140332152313600 [Note] WSREP: Assign initial position for certification: 107996827, protocol version: 3
2016-02-11 17:32:10 140223589644032 [Note] WSREP: Service thread queue flushed.
2016-02-11 17:32:10 140332152313600 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b36c4dd2-7dbc-11e5-857f-c70794853074): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2016-02-11 17:32:10 140223526725376 [Note] WSREP: Member 1.0 (node2) requested state transfer from 'node1'. Selected 3.0 (node1)(SYNCED) as donor.
2016-02-11 17:32:10 140223526725376 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 107996829)
2016-02-11 17:32:10 140332152313600 [Note] WSREP: Requesting state transfer: success, donor: 3
2016-02-11 17:32:13 140223535118080 [Note] WSREP: (49fb877e, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 60077 (20160211 17:39:20.293)
WSREP_SST: [INFO] Joiner cleanup done. (20160211 17:39:20.797)
2016-02-11 17:39:20 140332152572096 [Note] WSREP: SST complete, seqno: 107996846

Then the joiner restarted, but failed to join the cluster:

2016-02-11 17:39:20 140332152572096 [Note] Using unique option prefix 'myisam_recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.
2016-02-11 17:39:20 7fa1a01ac8c0 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-02-11 17:39:20 140332152572096 [Note] InnoDB: The InnoDB memory heap is disabled
2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Memory barrier is not used
2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Compressed tables use zlib 1.2.7
2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Using SSE crc32 instructions
2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Initializing buffer pool, size = 640.0G
2016-02-11 17:39:43 140332152572096 [Note] InnoDB: Completed initialization of buffer pool
2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Highest supported file format is Barracuda.
2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Log scan progressed past the checkpoint lsn 2415243635981
2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Database was not shutdown normally!
2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Starting crash recovery.
2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Reading tablespace information from the .ibd files...
2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Restoring possible half-written data pages 
2016-02-11 17:39:46 140332152572096 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 2415248878592
InnoDB: Doing recovery: scanned up to log sequence number 2415254121472
InnoDB: Doing recovery: scanned up to log sequence number 2415259364352
InnoDB: Doing recovery: scanned up to log sequence number 2415264607232
InnoDB: Doing recovery: scanned up to log sequence number 2415269850112
InnoDB: Doing recovery: scanned up to log sequence number 2415271879980
2016-02-11 17:39:49 140332152572096 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
2016-02-11 17:40:03 140332152572096 [Note] InnoDB: 128 rollback segment(s) are active.
2016-02-11 17:40:03 140332152572096 [Note] InnoDB: Waiting for purge to start
2016-02-11 17:40:03 140332152572096 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-76.0 started; log sequence number 2415271879980
2016-02-11 17:40:03 139469109827328 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-02-11 17:40:03 140332152572096 [Note] Server socket created on IP: '::'.
2016-02-11 17:40:03 140332151813888 [Note] /opt/mariadb/bin/mysqld: Normal shutdown
 
2016-02-11 17:40:03 140332151813888 [Note] WSREP: Stop replication
2016-02-11 17:40:03 140332151813888 [Note] WSREP: Closing send monitor...
2016-02-11 17:40:03 140332151813888 [Note] WSREP: Closed send monitor.
2016-02-11 17:40:03 140332151813888 [Note] WSREP: gcomm: terminating thread
2016-02-11 17:40:03 140332151813888 [Note] WSREP: gcomm: joining thread
2016-02-11 17:40:03 140332151813888 [Note] WSREP: gcomm: closing backend
2016-02-11 17:40:03 140332152572096 [Note] WSREP: Signalling provider to continue.
2016-02-11 17:40:03 140332151564032 [Note] Event Scheduler: scheduler thread started with id 4
2016-02-11 17:40:03 140332152572096 [Note] WSREP: SST received: b36c4dd2-7dbc-11e5-857f-c70794853074:107996846
2016-02-11 17:40:03 140332152572096 [Note] /opt/mariadb/bin/mysqld: ready for connections.
Version: '10.1.11-MariaDB-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
2016-02-11 17:40:03 140332151813888 [Note] WSREP: view(view_id(NON_PRIM,00ec119f,1687) memb {
	49fb877e,0
} joined {
} left {
} partitioned {
	00ec119f,0
	56787688,0
	868cb19d,0
	a6816777,0
})
2016-02-11 17:40:03 140332151813888 [Note] WSREP: view((empty))
2016-02-11 17:40:03 140223526725376 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-02-11 17:40:03 140332151813888 [Note] WSREP: gcomm: closed
2016-02-11 17:40:03 140332152313600 [Warning] WSREP: 0x7fa19b056040 down context(s) not set
2016-02-11 17:40:03 140332152313600 [Warning] WSREP: Sending JOIN failed: -107 (Transport endpoint is not connected). Will retry in new primary component.

Then the node threw a bunch of warnings and shut down:

2016-02-11 17:40:03 140223526725376 [Note] WSREP: Flow-control interval: [16, 16]
2016-02-11 17:40:03 140223526725376 [Note] WSREP: Received NON-PRIMARY.
2016-02-11 17:40:03 140223526725376 [Note] WSREP: Shifting JOINER -> OPEN (TO: 108007812)
2016-02-11 17:40:03 140223526725376 [Note] WSREP: Received self-leave message.
2016-02-11 17:40:03 140223526725376 [Note] WSREP: Flow-control interval: [0, 0]
2016-02-11 17:40:03 140223526725376 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2016-02-11 17:40:03 140223526725376 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 108007812)
2016-02-11 17:40:03 140223526725376 [Note] WSREP: RECV thread exiting 0: Success
2016-02-11 17:40:03 140332151813888 [Note] WSREP: recv_thread() joined.
2016-02-11 17:40:03 140332151813888 [Note] WSREP: Closing replication queue.
2016-02-11 17:40:03 140332151813888 [Note] WSREP: Closing slave action queue.
2016-02-11 17:40:03 140223589644032 [Warning] WSREP: Failed to report last committed 107996922, -77 (File descriptor in bad state)
2016-02-11 17:40:03 140223589644032 [Warning] WSREP: Failed to report last committed 107997050, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997176, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997306, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997435, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997560, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997689, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997817, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997947, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998075, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998203, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998331, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998459, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998585, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998715, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998842, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998970, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999099, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999226, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999355, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999483, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999605, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999739, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999864, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999995, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000122, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000250, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000377, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000505, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000635, -77 (File descriptor in bad state)
2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000761, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108000891, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001019, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001147, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001272, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001402, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001529, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001657, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001780, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001915, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002041, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002170, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002299, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002427, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002554, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002682, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002809, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002933, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003064, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003195, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003323, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003448, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003578, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003707, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003835, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003962, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004091, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004219, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004347, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004473, -77 (File descriptor in bad state)
2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004597, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108004729, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108004859, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108004981, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005115, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005238, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005364, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005498, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005627, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005755, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005881, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006010, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006138, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006267, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006394, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006522, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006651, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006779, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006903, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007034, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007161, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007291, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007418, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007542, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007675, -77 (File descriptor in bad state)
2016-02-11 17:40:06 140223595144960 [Note] WSREP: New cluster view: global state: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-02-11 17:40:06 140223595144960 [Note] WSREP: New cluster view: global state: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 3
2016-02-11 17:40:07 140223595144960 [Note] WSREP: applier thread exiting (code:0)
2016-02-11 17:40:07 140332151314176 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223594895104 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223596893952 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332076865280 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140330981369600 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140330980869888 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140330981869312 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332075865856 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332075616000 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223594395392 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332076115712 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332076615424 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223597643520 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223597393664 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140330981119744 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223596644096 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223595894528 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332152313600 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223595644672 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140330982618880 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223596144384 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223595394816 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332151064320 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332076365568 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140330982119168 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223597143808 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140330982369024 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140330981619456 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223594645248 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140223596394240 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332077115136 [Note] WSREP: applier thread exiting (code:6)
2016-02-11 17:40:07 140332152563456 [Note] WSREP: rollbacker thread exiting
2016-02-11 17:40:07 140332151813888 [Note] Event Scheduler: Killing the scheduler thread, thread id 4
2016-02-11 17:40:07 140332151813888 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2016-02-11 17:40:07 140332151813888 [Note] Event Scheduler: Stopped
2016-02-11 17:40:07 140332151813888 [Note] Event Scheduler: Purging the queue. 1 events
2016-02-11 17:40:07 140332151813888 [Note] WSREP: dtor state: CLOSED
2016-02-11 17:40:07 140332151813888 [Note] WSREP: mon: entered 10966 oooe fraction 0 oool fraction 0
2016-02-11 17:40:07 140332151813888 [Note] WSREP: mon: entered 10966 oooe fraction 0.956593 oool fraction 0
2016-02-11 17:40:07 140332151813888 [Note] WSREP: mon: entered 10988 oooe fraction 0 oool fraction 9.10084e-05
2016-02-11 17:40:07 140332151813888 [Note] WSREP: cert index usage at exit 0
2016-02-11 17:40:07 140332151813888 [Note] WSREP: cert trx map usage at exit 10985
2016-02-11 17:40:07 140332151813888 [Note] WSREP: deps set usage at exit 0
2016-02-11 17:40:07 140332151813888 [Note] WSREP: avg deps dist 2036.09
2016-02-11 17:40:07 140332151813888 [Note] WSREP: avg cert interval 1.2071
2016-02-11 17:40:07 140332151813888 [Note] WSREP: cert index size 5341
2016-02-11 17:40:07 140223589644032 [Note] WSREP: Service thread queue flushed.
2016-02-11 17:40:07 140332151813888 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
2016-02-11 17:40:07 140332151813888 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2016-02-11 17:40:07 140332151813888 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0, misses: 10985, in use: 0, in pool: 6516
2016-02-11 17:40:07 140332151813888 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 108007812)
2016-02-11 17:40:07 140332151813888 [Note] WSREP: Flushing memory map to disk...
2016-02-11 17:40:07 139469093041920 [Note] InnoDB: FTS optimize thread exiting.
2016-02-11 17:40:07 140332151813888 [Note] InnoDB: Starting shutdown...
2016-02-11 17:40:29 140332151813888 [Note] InnoDB: Shutdown completed; log sequence number 2415280906695
2016-02-11 17:40:29 140332151813888 [Note] /opt/mariadb/bin/mysqld: Shutdown complete

When the node was brought back up, it was able to successfully SST when it chose node3 as donor:

2016-02-11 18:18:56 140284274350272 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-02-11 18:18:56 140284274350272 [Note] WSREP: wsrep_load(): loading provider library '/opt/galera/libgalera_smm.so'
2016-02-11 18:18:56 140284274350272 [Note] WSREP: wsrep_load(): Galera 3.14(r0) by Codership Oy <info@codership.com> loaded successfully.
2016-02-11 18:18:56 140284274350272 [Note] WSREP: CRC-32C: using hardware acceleration.
2016-02-11 18:18:56 140284274350272 [Note] WSREP: Found saved state: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812
2016-02-11 18:18:56 140284274350272 [Note] WSREP: Passing config to GCS: base_dir = /data/mariadb/; base_host = 10.191.33.152; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT15S; evs.delayed_keep_period = PT90S; evs.inactive_check_period = PT3S; evs.inactive_timeout = PT30S; evs.install_timeout = PT30S; evs.join_retrans_period = PT3S; evs.keepalive_period = PT3S; evs.max_install_timeouts = 3; evs.send_window = 1024; evs.stats_report_period = PT2M; evs.suspect_timeout = PT15S; evs.user_send_window = 512; evs.view_forget_timeout = PT24H; gcache.dir = /data/mariadb/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mariadb//galera.cache; gcache.page_size = 128M; gcache.size = 100G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.chec
2016-02-11 18:18:56 140175711663872 [Note] WSREP: Service thread queue flushed.
2016-02-11 18:18:56 140284274350272 [Note] WSREP: Assign initial position for certification: 108007812, protocol version: -1
2016-02-11 18:18:56 140284274350272 [Note] WSREP: wsrep_sst_grab()
2016-02-11 18:18:56 140284274350272 [Note] WSREP: Start replication
2016-02-11 18:18:56 140284274350272 [Note] WSREP: Setting initial position to b36c4dd2-7dbc-11e5-857f-c70794853074:108007812
2016-02-11 18:18:56 140284274350272 [Note] WSREP: protonet asio version 0
2016-02-11 18:18:56 140284274350272 [Note] WSREP: Using CRC-32C for message checksums.
2016-02-11 18:18:56 140284274350272 [Note] WSREP: backend: asio
2016-02-11 18:18:56 140284274350272 [Warning] WSREP: access file(/data/mariadb//gvwstate.dat) failed(No such file or directory)
2016-02-11 18:18:56 140284274350272 [Note] WSREP: restore pc from disk failed
2016-02-11 18:18:56 140284274350272 [Note] WSREP: GMCast version 0
2016-02-11 18:18:56 140284274350272 [Note] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-02-11 18:18:56 140284274350272 [Note] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-02-11 18:18:56 140284274350272 [Note] WSREP: EVS version 0
2016-02-11 18:18:56 140284274350272 [Note] WSREP: gcomm: connecting to group 'doghouse', peer '10.191.47.205:,10.191.47.207:,10.191.33.152:,10.191.44.142:'
2016-02-11 18:18:56 140284274350272 [Warning] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') address 'tcp://10.191.33.152:4567' points to own listening address, blacklisting
2016-02-11 18:18:56 140284274350272 [Note] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.191.33.150:4567 
2016-02-11 18:18:57 140284274350272 [Note] WSREP: declaring 00ec119f at tcp://10.191.33.150:4567 stable
2016-02-11 18:18:57 140284274350272 [Note] WSREP: declaring 56787688 at tcp://10.191.47.207:4567 stable
2016-02-11 18:18:57 140284274350272 [Note] WSREP: declaring 868cb19d at tcp://10.191.47.205:4567 stable
2016-02-11 18:18:57 140284274350272 [Note] WSREP: declaring a6816777 at tcp://10.191.44.142:4567 stable
2016-02-11 18:18:57 140284274350272 [Note] WSREP: Node 00ec119f state prim
2016-02-11 18:18:57 140284274350272 [Note] WSREP: view(view_id(PRIM,00ec119f,1689) memb {
	00ec119f,0
	56787688,0
	868cb19d,0
	a6816777,0
	d2e36c1e,0
} joined {
} left {
} partitioned {
})
2016-02-11 18:18:57 140284274350272 [Note] WSREP: save pc into disk
2016-02-11 18:18:57 140284274350272 [Note] WSREP: gcomm: connected
2016-02-11 18:18:57 140284274350272 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-02-11 18:18:57 140284274350272 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-02-11 18:18:57 140284274350272 [Note] WSREP: Opened channel 'doghouse'
2016-02-11 18:18:57 140175648745216 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 4, memb_num = 5
2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: sent state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac
2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 0 (node3)
2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 1 (shepherd)
2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 2 (node1)
2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 3 (bulldog)
2016-02-11 18:18:57 140284274350272 [Note] WSREP: Waiting for SST to complete.
2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 4 (node2)
2016-02-11 18:18:57 140175648745216 [Note] WSREP: Quorum results:
	version    = 3,
	component  = PRIMARY,
	conf_id    = 142,
	members    = 4/5 (joined/total),
	act_id     = 108088703,
	last_appl. = -1,
	protocols  = 0/7/3 (gcs/repl/appl),
	group UUID = b36c4dd2-7dbc-11e5-857f-c70794853074
2016-02-11 18:18:57 140175648745216 [Note] WSREP: Flow-control interval: [36, 36]
2016-02-11 18:18:57 140175648745216 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 108088703)
2016-02-11 18:18:57 140284274091776 [Note] WSREP: State transfer required: 
	Group state: b36c4dd2-7dbc-11e5-857f-c70794853074:108088703
	Local state: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812
2016-02-11 18:18:57 140284274091776 [Note] WSREP: New cluster view: global state: b36c4dd2-7dbc-11e5-857f-c70794853074:108088703, view# 143: Primary, number of nodes: 5, my index: 4, protocol version 3
2016-02-11 18:18:57 140284274091776 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-02-11 18:18:57 140175619389184 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.191.33.152' --datadir '/data/mariadb/'   --parent '64419'  '' '
2016-02-11 18:18:57 140284274091776 [Note] WSREP: Prepared SST request: rsync|10.191.33.152:4444/rsync_sst
2016-02-11 18:18:57 140284274091776 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-02-11 18:18:57 140175711663872 [Note] WSREP: Service thread queue flushed.
2016-02-11 18:18:57 140284274091776 [Note] WSREP: Assign initial position for certification: 108088703, protocol version: 3
2016-02-11 18:18:57 140175711663872 [Note] WSREP: Service thread queue flushed.
2016-02-11 18:18:57 140284274091776 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.191.33.152:4568
2016-02-11 18:18:57 140175648745216 [Note] WSREP: Member 4.0 (node2) requested state transfer from 'node1'. Selected 0.0 (node3)(SYNCED) as donor.
2016-02-11 18:18:57 140175648745216 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 108088707)
2016-02-11 18:18:57 140284274091776 [Note] WSREP: Requesting state transfer: success, donor: 0
2016-02-11 18:18:57 140175648745216 [Note] WSREP: 0.0 (node3): State transfer to 4.0 (node2) complete.
2016-02-11 18:18:57 140175648745216 [Note] WSREP: Member 0.0 (node3) synced with group.
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 64459 (20160211 18:18:58.370)
WSREP_SST: [INFO] Joiner cleanup done. (20160211 18:18:58.875)
2016-02-11 18:18:58 140284274350272 [Note] WSREP: SST complete, seqno: 108007812
2016-02-11 18:18:58 140284274350272 [Note] Using unique option prefix 'myisam_recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.
2016-02-11 18:18:58 7f967a5718c0 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-02-11 18:18:58 140284274350272 [Note] InnoDB: The InnoDB memory heap is disabled
2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Memory barrier is not used
2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Compressed tables use zlib 1.2.7
2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Using SSE crc32 instructions
2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Initializing buffer pool, size = 640.0G
2016-02-11 18:19:00 140175657137920 [Note] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-02-11 18:19:21 140284274350272 [Note] InnoDB: Completed initialization of buffer pool
2016-02-11 18:19:24 140284274350272 [Note] InnoDB: Highest supported file format is Barracuda.
2016-02-11 18:19:31 140284274350272 [Note] InnoDB: 128 rollback segment(s) are active.
2016-02-11 18:19:31 140284274350272 [Note] InnoDB: Waiting for purge to start
2016-02-11 18:19:31 140284274350272 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-76.0 started; log sequence number 2415280906695
2016-02-11 18:19:31 139421483501312 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-02-11 18:19:31 140284274350272 [Note] Server socket created on IP: '::'.
2016-02-11 18:19:31 140284274350272 [Note] WSREP: Signalling provider to continue.
2016-02-11 18:19:31 140284274350272 [Note] WSREP: SST received: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812
2016-02-11 18:19:31 140284274091776 [Note] WSREP: Receiving IST: 80891 writesets, seqnos 108007812-108088703
2016-02-11 18:19:31 140284273592064 [Note] Event Scheduler: scheduler thread started with id 4
2016-02-11 18:19:31 140284274350272 [Note] /opt/mariadb/bin/mysqld: ready for connections.
Version: '10.1.11-MariaDB-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
2016-02-11 18:19:31 140175716165376 [Warning] IP address '172.17.0.3' could not be resolved: Name or service not known
2016-02-11 18:19:50 140284274091776 [Note] WSREP: IST received: b36c4dd2-7dbc-11e5-857f-c70794853074:108088703
2016-02-11 18:19:51 140175648745216 [Note] WSREP: 4.0 (node2): State transfer from 0.0 (node3) complete.
2016-02-11 18:19:51 140175648745216 [Note] WSREP: Shifting JOINER -> JOINED (TO: 108090463)
2016-02-11 18:19:51 140175648745216 [Note] WSREP: Member 4.0 (node2) synced with group.
2016-02-11 18:19:51 140175648745216 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 108090482)
2016-02-11 18:19:51 140175718414080 [Note] WSREP: Synchronized with group, ready for connections



 Comments   
Comment by Geoff Montee (Inactive) [ 2016-02-19 ]

This may have been caused by a combination of MDEV-9598 and MDEV-9599.

Comment by Matyashov Andrey [ 2016-03-17 ]

I have a similar situation. I have 2 nodes with mariadb and 1 node with garbd. If one node with mariadb is shutdown, starting is as follows: on first run starting replication from second node, after replication, mysqld shutting down (see log), and on the second starting is normal.

2016-03-17 17:02:16 140719166449600 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-03-17 17:02:16 140719166449600 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2016-03-17 17:02:16 140719166449600 [Note] WSREP: wsrep_load(): Galera 25.3.14(r3560) by Codership Oy <info@codership.com> loaded successfully.
2016-03-17 17:02:16 140719166449600 [Note] WSREP: CRC-32C: using hardware acceleration.
2016-03-17 17:02:16 140719166449600 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-03-17 17:02:16 140719166449600 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.100.22.51; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 5G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 2; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; p
2016-03-17 17:02:16 140713437546240 [Note] WSREP: Service thread queue flushed.
2016-03-17 17:02:16 140719166449600 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-03-17 17:02:16 140719166449600 [Note] WSREP: wsrep_sst_grab()
2016-03-17 17:02:16 140719166449600 [Note] WSREP: Start replication
2016-03-17 17:02:16 140719166449600 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2016-03-17 17:02:16 140719166449600 [Note] WSREP: protonet asio version 0
2016-03-17 17:02:16 140719166449600 [Note] WSREP: Using CRC-32C for message checksums.
2016-03-17 17:02:16 140719166449600 [Note] WSREP: backend: asio
2016-03-17 17:02:16 140719166449600 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2016-03-17 17:02:16 140719166449600 [Note] WSREP: restore pc from disk failed
2016-03-17 17:02:16 140719166449600 [Note] WSREP: GMCast version 0
2016-03-17 17:02:16 140719166449600 [Note] WSREP: (db23b973, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-03-17 17:02:16 140719166449600 [Note] WSREP: (db23b973, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-03-17 17:02:16 140719166449600 [Note] WSREP: EVS version 0
2016-03-17 17:02:16 140719166449600 [Note] WSREP: gcomm: connecting to group 'mariadb-cluster', peer '10.100.22.54:,10.100.22.53:,10.100.22.52:'
2016-03-17 17:02:16 140719166449600 [Note] WSREP: (db23b973, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2016-03-17 17:02:16 140719166449600 [Note] WSREP: declaring 445b1cb2 at tcp://10.100.22.52:4567 stable
2016-03-17 17:02:16 140719166449600 [Note] WSREP: declaring cc7d6438 at tcp://10.100.22.54:4567 stable
2016-03-17 17:02:16 140719166449600 [Note] WSREP: Node 445b1cb2 state prim
2016-03-17 17:02:16 140719166449600 [Note] WSREP: view(view_id(PRIM,445b1cb2,97) memb {
        445b1cb2,2
        cc7d6438,0
        db23b973,2
} joined {
} left {
} partitioned {
})
2016-03-17 17:02:16 140719166449600 [Note] WSREP: save pc into disk
2016-03-17 17:02:16 140719166449600 [Note] WSREP: discarding pending addr without UUID: tcp://10.100.22.53:4567
2016-03-17 17:02:16 140719166449600 [Note] WSREP: discarding pending addr proto entry 0x7ffbb852c140
2016-03-17 17:02:17 140719166449600 [Note] WSREP: gcomm: connected
2016-03-17 17:02:17 140719166449600 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-03-17 17:02:17 140719166449600 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-03-17 17:02:17 140719166449600 [Note] WSREP: Opened channel 'mariadb-cluster'
2016-03-17 17:02:17 140713375295232 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2016-03-17 17:02:17 140713375295232 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-03-17 17:02:17 140713375295232 [Note] WSREP: STATE EXCHANGE: sent state msg: db7502b7-ec48-11e5-8b71-2f99a1596934
2016-03-17 17:02:17 140713375295232 [Note] WSREP: STATE EXCHANGE: got state msg: db7502b7-ec48-11e5-8b71-2f99a1596934 from 0 (mariadb-02)
2016-03-17 17:02:17 140713375295232 [Note] WSREP: STATE EXCHANGE: got state msg: db7502b7-ec48-11e5-8b71-2f99a1596934 from 1 (garb)
2016-03-17 17:02:17 140719166449600 [Note] WSREP: Waiting for SST to complete.
2016-03-17 17:02:17 140713375295232 [Note] WSREP: STATE EXCHANGE: got state msg: db7502b7-ec48-11e5-8b71-2f99a1596934 from 2 (mariadb-01)
2016-03-17 17:02:17 140713375295232 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 94,
        members    = 2/3 (joined/total),
        act_id     = 399957,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 1c272eb4-ec24-11e5-90ba-d71418314ba9
2016-03-17 17:02:17 140713375295232 [Note] WSREP: Flow-control interval: [28, 28]
2016-03-17 17:02:17 140713375295232 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 399957)
2016-03-17 17:02:17 140719166135040 [Note] WSREP: State transfer required: 
        Group state: 1c272eb4-ec24-11e5-90ba-d71418314ba9:399957
        Local state: 00000000-0000-0000-0000-000000000000:-1
2016-03-17 17:02:17 140719166135040 [Note] WSREP: New cluster view: global state: 1c272eb4-ec24-11e5-90ba-d71418314ba9:399957, view# 95: Primary, number of nodes: 3, my index: 2, protocol version 3
2016-03-17 17:02:17 140719166135040 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-03-17 17:02:17 140713345939200 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.100.22.51' --datadir '/var/lib/mysql/'   --parent '23206' --binlog '/var/log/mysql/mariadb-bin' '
WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20160317 17:02:17.858)
2016-03-17 17:02:18 140719166135040 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.100.22.51:4444/xtrabackup_sst//1
2016-03-17 17:02:18 140719166135040 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-17 17:02:18 140719166135040 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-03-17 17:02:18 140713437546240 [Note] WSREP: Service thread queue flushed.
2016-03-17 17:02:18 140719166135040 [Note] WSREP: Assign initial position for certification: 399957, protocol version: 3
2016-03-17 17:02:18 140713437546240 [Note] WSREP: Service thread queue flushed.
2016-03-17 17:02:18 140719166135040 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (1c272eb4-ec24-11e5-90ba-d71418314ba9): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2016-03-17 17:02:18 140713375295232 [Note] WSREP: Member 2.2 (mariadb-01) requested state transfer from '*any*'. Selected 0.2 (mariadb-02)(SYNCED) as donor.
2016-03-17 17:02:18 140713375295232 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 399964)
2016-03-17 17:02:18 140719166135040 [Note] WSREP: Requesting state transfer: success, donor: 0
2016-03-17 17:02:19 140713383687936 [Note] WSREP: (db23b973, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-03-17 17:03:34 140713375295232 [Note] WSREP: 0.2 (mariadb-02): State transfer to 2.2 (mariadb-01) complete.
2016-03-17 17:03:34 140713375295232 [Note] WSREP: Member 0.2 (mariadb-02) synced with group.
2016-03-17 17:03:59 140719166449600 [Note] WSREP: SST complete, seqno: 402110
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: The InnoDB memory heap is disabled
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Memory barrier is not used
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Compressed tables use zlib 1.2.8
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Using Linux native AIO
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Using SSE crc32 instructions
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Initializing buffer pool, size = 512.0M
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Completed initialization of buffer pool
2016-03-17 17:03:59 140719166449600 [Note] InnoDB: Highest supported file format is Barracuda.
2016-03-17 17:04:00 140719166449600 [Note] InnoDB: 128 rollback segment(s) are active.
2016-03-17 17:04:00 140719166449600 [Note] InnoDB: Waiting for purge to start
2016-03-17 17:04:00 140719166449600 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 67010522855
2016-03-17 17:04:00 140712418969344 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-03-17 17:04:01 140719166449600 [Note] Plugin 'FEEDBACK' is disabled.
2016-03-17 17:04:01 140719166449600 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
2016-03-17 17:04:01 140719166449600 [Note] Starting crash recovery...
2016-03-17 17:04:01 140719166449600 [Note] Crash recovery finished.
2016-03-17 17:04:01 140719166449600 [Note] Server socket created on IP: '0.0.0.0'.
2016-03-17 17:04:01 140719165016832 [Note] /usr/sbin/mysqld: Normal shutdown
 
2016-03-17 17:04:01 140719165016832 [Note] WSREP: Stop replication
2016-03-17 17:04:01 140719165016832 [Note] WSREP: Closing send monitor...
2016-03-17 17:04:01 140719165016832 [Note] WSREP: Closed send monitor.
2016-03-17 17:04:01 140719165016832 [Note] WSREP: gcomm: terminating thread
2016-03-17 17:04:01 140719165016832 [Note] WSREP: gcomm: joining thread
2016-03-17 17:04:01 140719165016832 [Note] WSREP: gcomm: closing backend
2016-03-17 17:04:01 140719166449600 [Warning] 'user' entry 'root@mariadb-01' ignored in --skip-name-resolve mode.
2016-03-17 17:04:01 140719166449600 [Warning] 'proxies_priv' entry '@% root@mariadb-01' ignored in --skip-name-resolve mode.
2016-03-17 17:04:01 140719165016832 [Note] WSREP: view(view_id(NON_PRIM,445b1cb2,97) memb {
        db23b973,2
} joined {
} left {
} partitioned {
        445b1cb2,2
        cc7d6438,0
})
2016-03-17 17:04:01 140719165016832 [Note] WSREP: view((empty))
2016-03-17 17:04:01 140713375295232 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-03-17 17:04:01 140719165016832 [Note] WSREP: gcomm: closed
2016-03-17 17:04:01 140713375295232 [Note] WSREP: Flow-control interval: [16, 16]
2016-03-17 17:04:01 140713375295232 [Note] WSREP: Received NON-PRIMARY.
2016-03-17 17:04:01 140713375295232 [Note] WSREP: Shifting JOINER -> OPEN (TO: 403020)
2016-03-17 17:04:01 140713375295232 [Note] WSREP: Received self-leave message.
2016-03-17 17:04:01 140713375295232 [Note] WSREP: Flow-control interval: [0, 0]
2016-03-17 17:04:01 140713375295232 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2016-03-17 17:04:01 140713375295232 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 403020)
2016-03-17 17:04:01 140713375295232 [Note] WSREP: RECV thread exiting 0: Success
2016-03-17 17:04:01 140719166449600 [Note] WSREP: Signalling provider to continue.
2016-03-17 17:04:01 140719166449600 [Note] WSREP: SST received: 1c272eb4-ec24-11e5-90ba-d71418314ba9:402110
2016-03-17 17:04:01 140719098231552 [Note] Event Scheduler: scheduler thread started with id 4
2016-03-17 17:04:01 140719166135040 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).
2016-03-17 17:04:01 140719166135040 [ERROR] WSREP: Failed to JOIN the cluster after SST
2016-03-17 17:04:01 140719165016832 [Note] WSREP: recv_thread() joined.
2016-03-17 17:04:01 140719165016832 [Note] WSREP: Closing replication queue.
2016-03-17 17:04:01 140719165016832 [Note] WSREP: Closing slave action queue.
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 400083, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140719166449600 [Note] Reading of all Master_info entries succeded
2016-03-17 17:04:01 140719166449600 [Note] Added new Master_info '' to hash table
2016-03-17 17:04:01 140719166449600 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.12-MariaDB-1~jessie'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 400211, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 400337, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 400466, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 400593, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 400721, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 400851, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 400976, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 401108, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 401234, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 401363, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 401488, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 401617, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 401748, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 401872, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 402002, -77 (File descriptor in bad state)
2016-03-17 17:04:01 140713437546240 [Warning] WSREP: Failed to report last committed 402122, -77 (File descriptor in bad state)
2016-03-17 17:04:02 140713437546240 [Warning] WSREP: Failed to report last committed 402252, -77 (File descriptor in bad state)
2016-03-17 17:04:04 140713437546240 [Warning] WSREP: Failed to report last committed 402379, -77 (File descriptor in bad state)
2016-03-17 17:04:05 140713437546240 [Warning] WSREP: Failed to report last committed 402508, -77 (File descriptor in bad state)
2016-03-17 17:04:06 140713437546240 [Warning] WSREP: Failed to report last committed 402634, -77 (File descriptor in bad state)
2016-03-17 17:04:07 140713437546240 [Warning] WSREP: Failed to report last committed 402756, -77 (File descriptor in bad state)
2016-03-17 17:04:09 140713437546240 [Warning] WSREP: Failed to report last committed 402891, -77 (File descriptor in bad state)
2016-03-17 17:04:10 140719096716032 [Note] WSREP: New cluster view: global state: 1c272eb4-ec24-11e5-90ba-d71418314ba9:403020, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-03-17 17:04:10 140719096716032 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-17 17:04:10 140719096716032 [Note] WSREP: New cluster view: global state: 1c272eb4-ec24-11e5-90ba-d71418314ba9:403020, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 3
2016-03-17 17:04:10 140719096716032 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-17 17:04:10 140719096716032 [Note] WSREP: applier thread exiting (code:0)
2016-03-17 17:04:10 140718819195648 [Note] WSREP: applier thread exiting (code:6)
2016-03-17 17:04:10 140719097625344 [Note] WSREP: applier thread exiting (code:6)
2016-03-17 17:04:10 140719097322240 [Note] WSREP: applier thread exiting (code:6)
2016-03-17 17:04:10 140719166135040 [Note] WSREP: applier thread exiting (code:6)
2016-03-17 17:04:10 140719097928448 [Note] WSREP: applier thread exiting (code:6)
2016-03-17 17:04:10 140719097019136 [Note] WSREP: applier thread exiting (code:6)
2016-03-17 17:04:10 140718819498752 [Note] WSREP: applier thread exiting (code:6)
2016-03-17 17:04:10 140719166438144 [Note] WSREP: rollbacker thread exiting
2016-03-17 17:04:10 140719165016832 [Note] Event Scheduler: Killing the scheduler thread, thread id 4
2016-03-17 17:04:10 140719165016832 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2016-03-17 17:04:10 140719165016832 [Note] Event Scheduler: Stopped
2016-03-17 17:04:10 140719165016832 [Note] Event Scheduler: Purging the queue. 0 events
2016-03-17 17:04:10 140719165016832 [Note] WSREP: dtor state: CLOSED
2016-03-17 17:04:10 140719165016832 [Note] WSREP: mon: entered 910 oooe fraction 0.971429 oool fraction 0.127473
2016-03-17 17:04:10 140719165016832 [Note] WSREP: mon: entered 910 oooe fraction 0.972527 oool fraction 0.127473
2016-03-17 17:04:10 140719165016832 [Note] WSREP: mon: entered 3080 oooe fraction 0 oool fraction 0.000324675
2016-03-17 17:04:10 140719165016832 [Note] WSREP: cert index usage at exit 0
2016-03-17 17:04:10 140719165016832 [Note] WSREP: cert trx map usage at exit 76
2016-03-17 17:04:10 140719165016832 [Note] WSREP: deps set usage at exit 0
2016-03-17 17:04:10 140719165016832 [Note] WSREP: avg deps dist 791.866
2016-03-17 17:04:10 140719165016832 [Note] WSREP: avg cert interval 2.07966
2016-03-17 17:04:10 140719165016832 [Note] WSREP: cert index size 145
2016-03-17 17:04:10 140713437546240 [Note] WSREP: Service thread queue flushed.
2016-03-17 17:04:10 140719165016832 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
2016-03-17 17:04:10 140719165016832 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2016-03-17 17:04:10 140719165016832 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0.275547, misses: 2219, in use: 0, in pool: 2068
2016-03-17 17:04:10 140719165016832 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 403020)
2016-03-17 17:04:10 140719165016832 [Note] WSREP: Flushing memory map to disk...
2016-03-17 17:04:10 140712402183936 [Note] InnoDB: FTS optimize thread exiting.
2016-03-17 17:04:10 140719165016832 [Note] InnoDB: Starting shutdown...
2016-03-17 17:04:12 140719165016832 [Note] InnoDB: Shutdown completed; log sequence number 67011269373
2016-03-17 17:04:12 140719165016832 [Note] /usr/sbin/mysqld: Shutdown complete

my.cnf:

[client]
port                    = 3306
socket                  = /var/run/mysqld/mysqld.sock
 
[mysqld_safe]
socket                  = /var/run/mysqld/mysqld.sock
nice                    = 0
 
[mysqld]
user                    = mysql
pid-file                = /var/run/mysqld/mysqld.pid
socket                  = /var/run/mysqld/mysqld.sock
port                    = 3306
basedir                 = /usr
datadir                 = /var/lib/mysql
tmpdir                  = /tmp
lc_messages_dir         = /usr/share/mysql
lc_messages             = en_US
skip-external-locking   = 1
collation-server        = utf8_general_ci
init-connect            = 'SET NAMES utf8'
character-set-server    = utf8
bind-address            = 0.0.0.0
skip-name-resolve       = yes
max_connections         = 1024
connect_timeout         = 5
wait_timeout            = 600
max_allowed_packet      = 128M
thread_cache_size       = 128
sort_buffer_size        = 32M
bulk_insert_buffer_size = 64M
tmp_table_size          = 128M
max_heap_table_size     = 128M
myisam_recover_options  = BACKUP
key_buffer_size         = 64M
table_open_cache        = 400
myisam_sort_buffer_size = 512M
concurrent_insert       = 2
read_buffer_size        = 32M
read_rnd_buffer_size    = 16M
query_cache_limit       = 32M
query_cache_size        = 256M
general_log_file        = /var/log/mysql/mysql.log
general_log             = 0
log-error               = /var/log/mysql/mysql_err.log
log_warnings            = 2
slow_query_log      = 0
slow_query_log_file     = /var/log/mysql/mariadb-slow.log
long_query_time         = 10
log_slow_verbosity      = query_plan
server-id               = 1
report_host             = mariadb-01
log_bin                 = /var/log/mysql/mariadb-bin
log_bin_index           = /var/log/mysql/mariadb-bin.index
event_scheduler         = on
sync_binlog             = 0
expire_logs_days        = 32
max_binlog_size         = 1024M
log_slave_updates       = 1
slave_type_conversions  = ALL_NON_LOSSY
default_storage_engine  = InnoDB
innodb_log_file_size    = 512M
innodb_buffer_pool_size = 512M
innodb_log_buffer_size  = 64M
innodb_file_per_table   = 1
innodb_doublewrite      = 1
innodb_open_files       = 2400
innodb_io_capacity      = 2400
innodb_flush_method     = O_DIRECT
innodb_file_format      = barracuda
innodb-data-file-path   = ibdata1:10M:autoextend
 
[galera]
wsrep_on                = ON
wsrep_provider          = /usr/lib/libgalera_smm.so
wsrep_node_address      = 10.100.22.51
wsrep_cluster_address   = gcomm://10.100.22.54,10.100.22.53,10.100.22.52
wsrep_sst_auth          = root:MySecret
wsrep_cluster_name      = mariadb-cluster
wsrep_node_name         = mariadb-01
binlog_format           = row
default_storage_engine  = InnoDB
innodb_autoinc_lock_mode= 2
wsrep_sst_method        = xtrabackup-v2
wsrep_retry_autocommit  = 3
wsrep_debug             = Off
wsrep_auto_increment_control = Off
wsrep_provider_options  = "gcache.size=5G;repl.commit_order=1;gmcast.segment=2;pc.recovery=TRUE;debug=no;gcs.fc_debug=0;pc.wait_prim=true"
wsrep_replicate_myisam  = On
bind-address            = 0.0.0.0
wsrep_slave_threads     = 8
innodb_flush_log_at_trx_commit = 1
 
[mysqldump]
quick
quote-names
max_allowed_packet      = 128M
 
[mysql]
max_allowed_packet      = 128M
 
[isamchk]
key_buffer              = 128M
!includedir /etc/mysql/conf.d/

Thanks!

Comment by Matyashov Andrey [ 2016-03-21 ]

I add new node in this cluster, and mysqld on new node is crashed.

2016-03-21 17:52:18 139742394914752 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-03-21 17:52:18 139742394914752 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2016-03-21 17:52:18 139742394914752 [Note] WSREP: wsrep_load(): Galera 25.3.14(r3560) by Codership Oy <info@codership.com> loaded successfully.
2016-03-21 17:52:18 139742394914752 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2016-03-21 17:52:18 139742394914752 [Warning] WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
2016-03-21 17:52:18 139742394914752 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-03-21 17:52:18 139742394914752 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.100.22.53; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 5G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 2; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; p
2016-03-21 17:52:18 139736663836416 [Note] WSREP: Service thread queue flushed.
2016-03-21 17:52:18 139742394914752 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-03-21 17:52:18 139742394914752 [Note] WSREP: wsrep_sst_grab()
2016-03-21 17:52:18 139742394914752 [Note] WSREP: Start replication
2016-03-21 17:52:18 139742394914752 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2016-03-21 17:52:18 139742394914752 [Note] WSREP: protonet asio version 0
2016-03-21 17:52:18 139742394914752 [Note] WSREP: Using CRC-32C for message checksums.
2016-03-21 17:52:18 139742394914752 [Note] WSREP: backend: asio
2016-03-21 17:52:18 139742394914752 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2016-03-21 17:52:18 139742394914752 [Note] WSREP: restore pc from disk failed
2016-03-21 17:52:18 139742394914752 [Note] WSREP: GMCast version 0
2016-03-21 17:52:18 139742394914752 [Note] WSREP: (82767ed6, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-03-21 17:52:18 139742394914752 [Note] WSREP: (82767ed6, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-03-21 17:52:18 139742394914752 [Note] WSREP: EVS version 0
2016-03-21 17:52:18 139742394914752 [Note] WSREP: gcomm: connecting to group 'mariadb-cluster', peer '10.100.22.54:,10.100.22.51:,10.100.22.52:'
2016-03-21 17:52:18 139742394914752 [Note] WSREP: (82767ed6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2016-03-21 17:52:18 139742394914752 [Note] WSREP: declaring 57a4a1b6 at tcp://10.100.22.54:4567 stable
2016-03-21 17:52:18 139742394914752 [Note] WSREP: declaring 5b227cc8 at tcp://10.100.22.51:4567 stable
2016-03-21 17:52:18 139742394914752 [Note] WSREP: declaring 983d3540 at tcp://10.100.22.52:4567 stable
2016-03-21 17:52:18 139742394914752 [Note] WSREP: Node 57a4a1b6 state prim
2016-03-21 17:52:18 139742394914752 [Note] WSREP: view(view_id(PRIM,57a4a1b6,13) memb {
        57a4a1b6,0
        5b227cc8,2
        82767ed6,2
        983d3540,2
} joined {
} left {
} partitioned {
})
2016-03-21 17:52:18 139742394914752 [Note] WSREP: save pc into disk
2016-03-21 17:52:19 139742394914752 [Note] WSREP: gcomm: connected
2016-03-21 17:52:19 139742394914752 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-03-21 17:52:19 139742394914752 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-03-21 17:52:19 139742394914752 [Note] WSREP: Opened channel 'mariadb-cluster'
2016-03-21 17:52:19 139736601585408 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 4
2016-03-21 17:52:19 139736601585408 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-03-21 17:52:19 139736601585408 [Note] WSREP: STATE EXCHANGE: sent state msg: 8c673d83-ef74-11e5-a78d-8e0f4b1d7dc5
2016-03-21 17:52:19 139736601585408 [Note] WSREP: STATE EXCHANGE: got state msg: 8c673d83-ef74-11e5-a78d-8e0f4b1d7dc5 from 0 (garb)
2016-03-21 17:52:19 139736601585408 [Note] WSREP: STATE EXCHANGE: got state msg: 8c673d83-ef74-11e5-a78d-8e0f4b1d7dc5 from 1 (mariadb-01)
2016-03-21 17:52:19 139742394914752 [Note] WSREP: Waiting for SST to complete.
2016-03-21 17:52:19 139736601585408 [Warning] WSREP: Action message in non-primary configuration from member 3
2016-03-21 17:52:19 139736601585408 [Note] WSREP: STATE EXCHANGE: got state msg: 8c673d83-ef74-11e5-a78d-8e0f4b1d7dc5 from 3 (mariadb-02)
2016-03-21 17:52:19 139736601585408 [Note] WSREP: STATE EXCHANGE: got state msg: 8c673d83-ef74-11e5-a78d-8e0f4b1d7dc5 from 2 (mariadb-03)
2016-03-21 17:52:19 139736601585408 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 11,
        members    = 3/4 (joined/total),
        act_id     = 2804469,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 4301ab2d-ee77-11e5-9d08-4e7e366b5ba2
2016-03-21 17:52:19 139736601585408 [Note] WSREP: Flow-control interval: [32, 32]
2016-03-21 17:52:19 139736601585408 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2804469)
2016-03-21 17:52:19 139742394600192 [Note] WSREP: State transfer required: 
        Group state: 4301ab2d-ee77-11e5-9d08-4e7e366b5ba2:2804469
        Local state: 00000000-0000-0000-0000-000000000000:-1
2016-03-21 17:52:19 139742394600192 [Note] WSREP: New cluster view: global state: 4301ab2d-ee77-11e5-9d08-4e7e366b5ba2:2804469, view# 12: Primary, number of nodes: 4, my index: 2, protocol version 3
2016-03-21 17:52:19 139742394600192 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-03-21 17:52:19 139736572229376 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.100.22.53' --datadir '/var/lib/mysql/'   --parent '27291' --binlog '/var/log/mysql/mariadb-bin' '
WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20160321 17:52:20.413)
2016-03-21 17:52:21 139742394600192 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.100.22.53:4444/xtrabackup_sst//1
2016-03-21 17:52:21 139742394600192 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-21 17:52:21 139742394600192 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-03-21 17:52:21 139736663836416 [Note] WSREP: Service thread queue flushed.
2016-03-21 17:52:21 139742394600192 [Note] WSREP: Assign initial position for certification: 2804469, protocol version: 3
2016-03-21 17:52:21 139736663836416 [Note] WSREP: Service thread queue flushed.
2016-03-21 17:52:21 139742394600192 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (4301ab2d-ee77-11e5-9d08-4e7e366b5ba2): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2016-03-21 17:52:21 139736601585408 [Note] WSREP: Member 2.2 (mariadb-03) requested state transfer from '*any*'. Selected 1.2 (mariadb-01)(SYNCED) as donor.
2016-03-21 17:52:21 139736601585408 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2804512)
2016-03-21 17:52:21 139742394600192 [Note] WSREP: Requesting state transfer: success, donor: 1
2016-03-21 17:52:22 139736609978112 [Note] WSREP: (82767ed6, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-03-21 17:53:49 139736601585408 [Note] WSREP: 1.2 (mariadb-01): State transfer to 2.2 (mariadb-03) complete.
2016-03-21 17:53:51 139736601585408 [Note] WSREP: Member 1.2 (mariadb-01) synced with group.
2016-03-21 17:54:49 139742394914752 [Note] WSREP: SST complete, seqno: 2807371
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: The InnoDB memory heap is disabled
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Memory barrier is not used
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Compressed tables use zlib 1.2.8
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Using Linux native AIO
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Using generic crc32 instructions
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Initializing buffer pool, size = 512.0M
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Completed initialization of buffer pool
2016-03-21 17:54:49 139742394914752 [Note] InnoDB: Highest supported file format is Barracuda.
2016-03-21 17:54:51 139742394914752 [Note] InnoDB: 128 rollback segment(s) are active.
2016-03-21 17:54:51 139742394914752 [Note] InnoDB: Waiting for purge to start
2016-03-21 17:54:51 139742394914752 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 74504173248
2016-03-21 17:54:52 139735645259520 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-03-21 17:54:52 139742394914752 [Note] Plugin 'FEEDBACK' is disabled.
2016-03-21 17:54:52 139742394914752 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
2016-03-21 17:54:52 139742394914752 [Note] Starting crash recovery...
2016-03-21 17:54:52 139742394914752 [Note] Crash recovery finished.
2016-03-21 17:54:53 139742394914752 [Note] Server socket created on IP: '0.0.0.0'.
2016-03-21 17:54:53 139742393486080 [Note] /usr/sbin/mysqld: Normal shutdown
 
2016-03-21 17:54:53 139742393486080 [Note] WSREP: Stop replication
2016-03-21 17:54:53 139742393486080 [Note] WSREP: Closing send monitor...
2016-03-21 17:54:53 139742393486080 [Note] WSREP: Closed send monitor.
2016-03-21 17:54:53 139742393486080 [Note] WSREP: gcomm: terminating thread
2016-03-21 17:54:53 139742393486080 [Note] WSREP: gcomm: joining thread
2016-03-21 17:54:53 139742393486080 [Note] WSREP: gcomm: closing backend
2016-03-21 17:54:53 139742394914752 [Warning] 'user' entry 'root@mariadb-01' ignored in --skip-name-resolve mode.
2016-03-21 17:54:53 139742394914752 [Warning] 'proxies_priv' entry '@% root@mariadb-01' ignored in --skip-name-resolve mode.
2016-03-21 17:54:53 139742393486080 [Note] WSREP: view(view_id(NON_PRIM,57a4a1b6,13) memb {
        82767ed6,2
} joined {
} left {
} partitioned {
        57a4a1b6,0
        5b227cc8,2
        983d3540,2
})
2016-03-21 17:54:53 139742393486080 [Note] WSREP: view((empty))
2016-03-21 17:54:53 139736601585408 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-03-21 17:54:53 139742394914752 [Note] WSREP: Signalling provider to continue.
2016-03-21 17:54:53 139742343891712 [Note] Event Scheduler: scheduler thread started with id 4
2016-03-21 17:54:53 139742393486080 [Note] WSREP: gcomm: closed
2016-03-21 17:54:53 139736601585408 [Note] WSREP: Flow-control interval: [16, 16]
2016-03-21 17:54:53 139736601585408 [Note] WSREP: Received NON-PRIMARY.
2016-03-21 17:54:53 139736601585408 [Note] WSREP: Shifting JOINER -> OPEN (TO: 2808993)
2016-03-21 17:54:53 139736601585408 [Note] WSREP: Received self-leave message.
2016-03-21 17:54:53 139736601585408 [Note] WSREP: Flow-control interval: [0, 0]
2016-03-21 17:54:53 139736601585408 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2016-03-21 17:54:53 139736601585408 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2808993)
2016-03-21 17:54:53 139736601585408 [Note] WSREP: RECV thread exiting 0: Success
2016-03-21 17:54:53 139742393486080 [Note] WSREP: recv_thread() joined.
2016-03-21 17:54:53 139742393486080 [Note] WSREP: Closing replication queue.
2016-03-21 17:54:53 139742393486080 [Note] WSREP: Closing slave action queue.
2016-03-21 17:54:53 139742394914752 [Note] WSREP: SST received: 4301ab2d-ee77-11e5-9d08-4e7e366b5ba2:2807371
2016-03-21 17:54:53 139742394914752 [ERROR] WSREP: Trying to launch slave threads before creating connection at 'gcomm://10.100.22.54,10.100.22.51,10.100.22.52'
mysqld: /home/buildbot/buildbot/build/mariadb-10.1.12/sql/wsrep_thd.cc:401: void wsrep_create_appliers(long int): Assertion `0' failed.
2016-03-21 17:54:53 139742394600192 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).
160321 17:54:53 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.12-MariaDB-1~jessie
key_buffer_size=67108864
read_buffer_size=33554432
max_used_connections=0
max_threads=1026
thread_count=3
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 67326468 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2016-03-21 17:54:53 139742394600192 [ERROR] WSREP: Failed to JOIN the cluster after SST
stack_bottom = 0x0 thread_stack 0x48400
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2805234, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2805362, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2805490, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2805618, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2805747, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2805875, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2806002, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2806128, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2806257, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2806386, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2806510, -77 (File descriptor in bad state)
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x559d572c6a3e]
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2806645, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2806770, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2806899, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2807027, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2807155, -77 (File descriptor in bad state)
2016-03-21 17:54:53 139736663836416 [Warning] WSREP: Failed to report last committed 2807282, -77 (File descriptor in bad state)
/usr/sbin/mysqld(handle_fatal_signal+0x34d)[0x559d56e08cbd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f184f8da8d0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f184d9cb067]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f184d9cc448]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7f184d9c4266]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7f184d9c4312]
/usr/sbin/mysqld(+0x552592)[0x559d56db9592]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1484)[0x559d56bf4044]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f184d9b7b45]
/usr/sbin/mysqld(+0x38007d)[0x559d56be707d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Comment by Andrii Nikitin (Inactive) [ 2017-07-18 ]

Those who see donor stuck in DONOR/DESYNCED state: as workaround I suggest to evaluate adding option --timeout=300 to every call "rsync --owner --group ..." in wsrep_sst_rsync script.
In my understanding actual solution of the bug may be using such option in the script.

Comment by Andrii Nikitin (Inactive) [ 2017-07-21 ]

I was able to reproduce the issue by pausing joiner VM during rsync. Eventually DONOR received an error and recovered. Thus exact timing may be related to network configuration (it is not related to keepalive settings on donor according to my tests).

2017-07-21 14:45:57 139982618486528 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 4)
2017-07-21 14:45:57 139983581760256 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-07-21 14:45:57 139982484260608 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.1.216:4444/rsync_sst' --socket '/home/a/spd/m1-10.1.11/dt/my.sock' --datadir '/home/a/spd/m1-10.1.11/dt/'  --defaults-file '/home/a/spd/m1-10.1.11/my.cnf'    '' --gtid '57d7015c-6e05-11e7-a5b6-3678faaee23e:4' --gtid-domain-id '0''
...
2017-07-21 14:47:00 139982626879232 [Note] WSREP: forgetting dfdc0e94 (tcp://192.168.1.216:4567)
2017-07-21 14:47:00 139982626879232 [Note] WSREP: deleting entry tcp://192.168.1.216:4567
...

And 15 min later rsync finally shows 'no route to host' error:

rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: No route to host (113)
rsync: connection unexpectedly closed (214 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
WSREP_SST: [ERROR] find/rsync returned code 123: (20170721 15:02:24.154)
2017-07-21 15:02:24 139982484260608 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address '192.168.1.216:4444/rsync_sst' --socket '/home/a/spd/m1-10.1.11/dt/my.sock' --datadir '/home/a/spd/m1-10.1.11/dt/'  --defaults-file '/home/a/spd/m1-10.1.11/my.cnf'    '' --gtid '57d7015c-6e05-11e7-a5b6-3678faaee23e:4' --gtid-domain-id '0'
...
2017-07-21 15:02:24 139982618486528 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 4)

I've attached patch with IO timeout for donor's rsync commands. E.g. with 60 sec timeout following happens on donor:

2017-07-21 17:10:55 139982618486528 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 4)
...
2017-07-21 17:11:09 139982618486528 [Note] WSREP: Flow-control interval: [28, 28]
2017-07-21 17:11:12 139982626879232 [Note] WSREP:  cleaning up b9451b52 (tcp://192.168.1.216:4567)
[sender] io timeout after 60 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(140) [sender=3.0.9]
WSREP_SST: [ERROR] rsync returned code 30: (20170721 17:12:03.693)
2017-07-21 17:12:03 139981871896320 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address '192.168.1.216:4444/rsync_sst' --socket '/home/a/spd/m1-10.1.11/dt/my.sock' --datadir '/home/a/spd/m1-10.1.11/dt/'  --defaults-file '/home/a/spd/m1-10.1.11/my.cnf'    '' --gtid '57d7015c-6e05-11e7-a5b6-3678faaee23e:4' --gtid-domain-id '0'
...
2017-07-21 17:12:03 139982618486528 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 4)

Comment by Jan Lindström (Inactive) [ 2019-09-19 ]

Please try with more recent version of MariaDB server.

Comment by Doug Whitfield [ 2021-08-06 ]

In what version was this fixed? Is 10.1.12 enough?

Generated at Thu Feb 08 07:35:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.