Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9554

Rsync SST donor stuck in DONOR/DESYNCED state when joiner crashes

    XMLWordPrintable

Details

    Description

      A Galera Cluster node crashed while receiving an rsync SST. After that, the donor node became stuck in the DONOR/DESYNCED status, even when the joiner returned to the cluster.

      First, node2 joined the cluster at 17:32 and requested an SST from node1.

      2016-02-11 17:32:10 140284524488448 [Note] WSREP: Quorum results:
      	version    = 3,
      	component  = PRIMARY,
      	conf_id    = 140,
      	members    = 4/5 (joined/total),
      	act_id     = 107996827,
      	last_appl. = 107996697,
      	protocols  = 0/7/3 (gcs/repl/appl),
      	group UUID = b36c4dd2-7dbc-11e5-857f-c70794853074
      2016-02-11 17:32:10 140284524488448 [Note] WSREP: Flow-control interval: [36, 36]
      2016-02-11 17:32:10 140391984752384 [Note] WSREP: New cluster view: global state: b36c4dd2-7dbc-11e5-857f-c70794853074:107996827, view# 141: Primary, number of nodes: 5, my index: 3, protocol version 3
      2016-02-11 17:32:10 140284524488448 [Note] WSREP: Member 1.0 (node2) requested state transfer from 'node1'. Selected 3.0 (node1)(SYNCED) as donor.

      Then node1 agreed to be donor for the SST:

      2016-02-11 17:32:10 140284524488448 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 107996829)
      2016-02-11 17:32:10 140391984752384 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2016-02-11 17:32:10 140284591601408 [Note] WSREP: Service thread queue flushed.
      2016-02-11 17:32:10 140391984752384 [Note] WSREP: Assign initial position for certification: 107996827, protocol version: 3
      2016-02-11 17:32:10 140284591601408 [Note] WSREP: Service thread queue flushed.
      2016-02-11 17:32:11 139530053068544 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '10.191.33.152:4444/rsync_sst' --socket '/tmp/mysql.sock' --datadir '/data/mariadb/'     '' --gtid 'b36c4dd2-7dbc-11e5-857f-c70794853074:107996829' --gtid-domain-id '0''
      2016-02-11 17:32:11 140391983253248 [Note] WSREP: sst_donor_thread signaled with 0
      2016-02-11 17:32:11 139530053068544 [Note] WSREP: Flushing tables for SST...
      2016-02-11 17:32:11 139530053068544 [Note] WSREP: Provider paused at b36c4dd2-7dbc-11e5-857f-c70794853074:107996846 (137640)
      2016-02-11 17:32:11 139530053068544 [Note] WSREP: Tables flushed.

      But before the SST could complete, node2 died:

      2016-02-11 17:32:13 140284537075456 [Note] WSREP: (868cb19d, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-02-11 17:40:03 140284537075456 [Note] WSREP: (868cb19d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.191.33.152:4567 
      2016-02-11 17:40:03 140284537075456 [Note] WSREP: declaring 00ec119f at tcp://10.191.33.150:4567 stable
      2016-02-11 17:40:03 140284537075456 [Note] WSREP: declaring 56787688 at tcp://10.191.47.207:4567 stable
      2016-02-11 17:40:03 140284537075456 [Note] WSREP: declaring a6816777 at tcp://10.191.44.142:4567 stable
      2016-02-11 17:40:03 140284537075456 [Note] WSREP: forgetting 49fb877e (tcp://10.191.33.152:4567)
      2016-02-11 17:40:03 140284537075456 [Note] WSREP: (868cb19d, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-02-11 17:40:04 140284537075456 [Note] WSREP: Node 00ec119f state prim
      2016-02-11 17:40:04 140284537075456 [Note] WSREP: view(view_id(PRIM,00ec119f,1688) memb {
      	00ec119f,0
      	56787688,0
      	868cb19d,0
      	a6816777,0
      } joined {
      } left {
      } partitioned {
      	49fb877e,0
      })
      2016-02-11 17:40:04 140284537075456 [Note] WSREP: save pc into disk
      2016-02-11 17:40:04 140284537075456 [Note] WSREP: forgetting 49fb877e (tcp://10.191.33.152:4567)

      But then node2 came back a little while later:

      2016-02-11 18:18:57 140284524488448 [Note] WSREP: Quorum results:
      	version    = 3,
      	component  = PRIMARY,
      	conf_id    = 142,
      	members    = 4/5 (joined/total),
      	act_id     = 108088703,
      	last_appl. = 107996828,
      	protocols  = 0/7/3 (gcs/repl/appl),
      	group UUID = b36c4dd2-7dbc-11e5-857f-c70794853074
      2016-02-11 18:18:57 140284524488448 [Note] WSREP: Flow-control interval: [36, 36]
      2016-02-11 18:18:57 140284524488448 [Note] WSREP: Member 4.0 (node2) requested state transfer from 'node1'. Selected 0.0 (node3)(SYNCED) as donor.

      As you can see from the above, node2 asked node1 to be donor again. However, node1 already thought it was a donor, so node2 instead chose node3 to be donor.

      Very soon after, the SST was complete:

      2016-02-11 18:18:57 140284524488448 [Note] WSREP: 0.0 (node3): State transfer to 4.0 (node2) complete.
      2016-02-11 18:18:57 140284524488448 [Note] WSREP: Member 0.0 (node3) synced with group.
      2016-02-11 18:19:00 140284537075456 [Note] WSREP: (868cb19d, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-02-11 18:19:51 140284524488448 [Note] WSREP: 4.0 (node2): State transfer from 0.0 (node3) complete.
      2016-02-11 18:19:51 140284524488448 [Note] WSREP: Member 4.0 (node2) synced with group.

      On node1, there is also a dead wsrep_sst_rsync process that is still running:

      34399 ? S 5:42 /bin/bash -ue /opt/mariadb-10.1.11//bin/wsrep_sst_rsync --role donor --address 10.191.33.152:4444/rsync_sst --socket /tmp/mysql.sock --datadir /data/mariadb/ --gtid b36c4dd2-7dbc-11e5-857f-c70794853074:107996829 --gtid-domain-id 0

      The joiner actually shows the SST completing at 17:39:

      2016-02-11 17:32:10 140223497369344 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.191.33.152' --datadir '/data/mariadb/'   --parent '60031'  '' '
      2016-02-11 17:32:10 140332152313600 [Note] WSREP: Prepared SST request: rsync|10.191.33.152:4444/rsync_sst
      2016-02-11 17:32:10 140332152313600 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2016-02-11 17:32:10 140223589644032 [Note] WSREP: Service thread queue flushed.
      2016-02-11 17:32:10 140332152313600 [Note] WSREP: Assign initial position for certification: 107996827, protocol version: 3
      2016-02-11 17:32:10 140223589644032 [Note] WSREP: Service thread queue flushed.
      2016-02-11 17:32:10 140332152313600 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b36c4dd2-7dbc-11e5-857f-c70794853074): 1 (Operation not permitted)
      	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
      2016-02-11 17:32:10 140223526725376 [Note] WSREP: Member 1.0 (node2) requested state transfer from 'node1'. Selected 3.0 (node1)(SYNCED) as donor.
      2016-02-11 17:32:10 140223526725376 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 107996829)
      2016-02-11 17:32:10 140332152313600 [Note] WSREP: Requesting state transfer: success, donor: 3
      2016-02-11 17:32:13 140223535118080 [Note] WSREP: (49fb877e, 'tcp://0.0.0.0:4567') turning message relay requesting off
      WSREP_SST: [INFO] Joiner cleanup. rsync PID: 60077 (20160211 17:39:20.293)
      WSREP_SST: [INFO] Joiner cleanup done. (20160211 17:39:20.797)
      2016-02-11 17:39:20 140332152572096 [Note] WSREP: SST complete, seqno: 107996846

      Then the joiner restarted, but failed to join the cluster:

      2016-02-11 17:39:20 140332152572096 [Note] Using unique option prefix 'myisam_recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.
      2016-02-11 17:39:20 7fa1a01ac8c0 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
      2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-02-11 17:39:20 140332152572096 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Memory barrier is not used
      2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Using SSE crc32 instructions
      2016-02-11 17:39:20 140332152572096 [Note] InnoDB: Initializing buffer pool, size = 640.0G
      2016-02-11 17:39:43 140332152572096 [Note] InnoDB: Completed initialization of buffer pool
      2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Highest supported file format is Barracuda.
      2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Log scan progressed past the checkpoint lsn 2415243635981
      2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Database was not shutdown normally!
      2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Starting crash recovery.
      2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Reading tablespace information from the .ibd files...
      2016-02-11 17:39:46 140332152572096 [Note] InnoDB: Restoring possible half-written data pages 
      2016-02-11 17:39:46 140332152572096 [Note] InnoDB: from the doublewrite buffer...
      InnoDB: Doing recovery: scanned up to log sequence number 2415248878592
      InnoDB: Doing recovery: scanned up to log sequence number 2415254121472
      InnoDB: Doing recovery: scanned up to log sequence number 2415259364352
      InnoDB: Doing recovery: scanned up to log sequence number 2415264607232
      InnoDB: Doing recovery: scanned up to log sequence number 2415269850112
      InnoDB: Doing recovery: scanned up to log sequence number 2415271879980
      2016-02-11 17:39:49 140332152572096 [Note] InnoDB: Starting an apply batch of log records to the database...
      InnoDB: Progress in percent: 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
      InnoDB: Apply batch completed
      2016-02-11 17:40:03 140332152572096 [Note] InnoDB: 128 rollback segment(s) are active.
      2016-02-11 17:40:03 140332152572096 [Note] InnoDB: Waiting for purge to start
      2016-02-11 17:40:03 140332152572096 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-76.0 started; log sequence number 2415271879980
      2016-02-11 17:40:03 139469109827328 [Note] InnoDB: Dumping buffer pool(s) not yet started
      2016-02-11 17:40:03 140332152572096 [Note] Server socket created on IP: '::'.
      2016-02-11 17:40:03 140332151813888 [Note] /opt/mariadb/bin/mysqld: Normal shutdown
       
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: Stop replication
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: Closing send monitor...
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: Closed send monitor.
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: gcomm: terminating thread
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: gcomm: joining thread
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: gcomm: closing backend
      2016-02-11 17:40:03 140332152572096 [Note] WSREP: Signalling provider to continue.
      2016-02-11 17:40:03 140332151564032 [Note] Event Scheduler: scheduler thread started with id 4
      2016-02-11 17:40:03 140332152572096 [Note] WSREP: SST received: b36c4dd2-7dbc-11e5-857f-c70794853074:107996846
      2016-02-11 17:40:03 140332152572096 [Note] /opt/mariadb/bin/mysqld: ready for connections.
      Version: '10.1.11-MariaDB-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: view(view_id(NON_PRIM,00ec119f,1687) memb {
      	49fb877e,0
      } joined {
      } left {
      } partitioned {
      	00ec119f,0
      	56787688,0
      	868cb19d,0
      	a6816777,0
      })
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: view((empty))
      2016-02-11 17:40:03 140223526725376 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: gcomm: closed
      2016-02-11 17:40:03 140332152313600 [Warning] WSREP: 0x7fa19b056040 down context(s) not set
      2016-02-11 17:40:03 140332152313600 [Warning] WSREP: Sending JOIN failed: -107 (Transport endpoint is not connected). Will retry in new primary component.

      Then the node threw a bunch of warnings and shut down:

      2016-02-11 17:40:03 140223526725376 [Note] WSREP: Flow-control interval: [16, 16]
      2016-02-11 17:40:03 140223526725376 [Note] WSREP: Received NON-PRIMARY.
      2016-02-11 17:40:03 140223526725376 [Note] WSREP: Shifting JOINER -> OPEN (TO: 108007812)
      2016-02-11 17:40:03 140223526725376 [Note] WSREP: Received self-leave message.
      2016-02-11 17:40:03 140223526725376 [Note] WSREP: Flow-control interval: [0, 0]
      2016-02-11 17:40:03 140223526725376 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2016-02-11 17:40:03 140223526725376 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 108007812)
      2016-02-11 17:40:03 140223526725376 [Note] WSREP: RECV thread exiting 0: Success
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: recv_thread() joined.
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: Closing replication queue.
      2016-02-11 17:40:03 140332151813888 [Note] WSREP: Closing slave action queue.
      2016-02-11 17:40:03 140223589644032 [Warning] WSREP: Failed to report last committed 107996922, -77 (File descriptor in bad state)
      2016-02-11 17:40:03 140223589644032 [Warning] WSREP: Failed to report last committed 107997050, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997176, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997306, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997435, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997560, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997689, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997817, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107997947, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998075, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998203, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998331, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998459, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998585, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998715, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998842, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107998970, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999099, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999226, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999355, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999483, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999605, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999739, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999864, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 107999995, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000122, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000250, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000377, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000505, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000635, -77 (File descriptor in bad state)
      2016-02-11 17:40:04 140223589644032 [Warning] WSREP: Failed to report last committed 108000761, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108000891, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001019, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001147, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001272, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001402, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001529, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001657, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001780, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108001915, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002041, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002170, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002299, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002427, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002554, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002682, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002809, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108002933, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003064, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003195, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003323, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003448, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003578, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003707, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003835, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108003962, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004091, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004219, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004347, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004473, -77 (File descriptor in bad state)
      2016-02-11 17:40:05 140223589644032 [Warning] WSREP: Failed to report last committed 108004597, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108004729, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108004859, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108004981, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005115, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005238, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005364, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005498, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005627, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005755, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108005881, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006010, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006138, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006267, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006394, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006522, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006651, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006779, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108006903, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007034, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007161, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007291, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007418, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007542, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223589644032 [Warning] WSREP: Failed to report last committed 108007675, -77 (File descriptor in bad state)
      2016-02-11 17:40:06 140223595144960 [Note] WSREP: New cluster view: global state: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
      2016-02-11 17:40:06 140223595144960 [Note] WSREP: New cluster view: global state: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 3
      2016-02-11 17:40:07 140223595144960 [Note] WSREP: applier thread exiting (code:0)
      2016-02-11 17:40:07 140332151314176 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223594895104 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223596893952 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332076865280 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140330981369600 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140330980869888 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140330981869312 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332075865856 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332075616000 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223594395392 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332076115712 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332076615424 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223597643520 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223597393664 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140330981119744 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223596644096 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223595894528 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332152313600 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223595644672 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140330982618880 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223596144384 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223595394816 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332151064320 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332076365568 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140330982119168 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223597143808 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140330982369024 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140330981619456 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223594645248 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140223596394240 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332077115136 [Note] WSREP: applier thread exiting (code:6)
      2016-02-11 17:40:07 140332152563456 [Note] WSREP: rollbacker thread exiting
      2016-02-11 17:40:07 140332151813888 [Note] Event Scheduler: Killing the scheduler thread, thread id 4
      2016-02-11 17:40:07 140332151813888 [Note] Event Scheduler: Waiting for the scheduler thread to reply
      2016-02-11 17:40:07 140332151813888 [Note] Event Scheduler: Stopped
      2016-02-11 17:40:07 140332151813888 [Note] Event Scheduler: Purging the queue. 1 events
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: dtor state: CLOSED
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: mon: entered 10966 oooe fraction 0 oool fraction 0
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: mon: entered 10966 oooe fraction 0.956593 oool fraction 0
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: mon: entered 10988 oooe fraction 0 oool fraction 9.10084e-05
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: cert index usage at exit 0
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: cert trx map usage at exit 10985
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: deps set usage at exit 0
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: avg deps dist 2036.09
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: avg cert interval 1.2071
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: cert index size 5341
      2016-02-11 17:40:07 140223589644032 [Note] WSREP: Service thread queue flushed.
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0, misses: 10985, in use: 0, in pool: 6516
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 108007812)
      2016-02-11 17:40:07 140332151813888 [Note] WSREP: Flushing memory map to disk...
      2016-02-11 17:40:07 139469093041920 [Note] InnoDB: FTS optimize thread exiting.
      2016-02-11 17:40:07 140332151813888 [Note] InnoDB: Starting shutdown...
      2016-02-11 17:40:29 140332151813888 [Note] InnoDB: Shutdown completed; log sequence number 2415280906695
      2016-02-11 17:40:29 140332151813888 [Note] /opt/mariadb/bin/mysqld: Shutdown complete

      When the node was brought back up, it was able to successfully SST when it chose node3 as donor:

      2016-02-11 18:18:56 140284274350272 [Note] WSREP: Read nil XID from storage engines, skipping position init
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: wsrep_load(): loading provider library '/opt/galera/libgalera_smm.so'
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: wsrep_load(): Galera 3.14(r0) by Codership Oy <info@codership.com> loaded successfully.
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: CRC-32C: using hardware acceleration.
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: Found saved state: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: Passing config to GCS: base_dir = /data/mariadb/; base_host = 10.191.33.152; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT15S; evs.delayed_keep_period = PT90S; evs.inactive_check_period = PT3S; evs.inactive_timeout = PT30S; evs.install_timeout = PT30S; evs.join_retrans_period = PT3S; evs.keepalive_period = PT3S; evs.max_install_timeouts = 3; evs.send_window = 1024; evs.stats_report_period = PT2M; evs.suspect_timeout = PT15S; evs.user_send_window = 512; evs.view_forget_timeout = PT24H; gcache.dir = /data/mariadb/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mariadb//galera.cache; gcache.page_size = 128M; gcache.size = 100G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.chec
      2016-02-11 18:18:56 140175711663872 [Note] WSREP: Service thread queue flushed.
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: Assign initial position for certification: 108007812, protocol version: -1
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: wsrep_sst_grab()
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: Start replication
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: Setting initial position to b36c4dd2-7dbc-11e5-857f-c70794853074:108007812
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: protonet asio version 0
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: Using CRC-32C for message checksums.
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: backend: asio
      2016-02-11 18:18:56 140284274350272 [Warning] WSREP: access file(/data/mariadb//gvwstate.dat) failed(No such file or directory)
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: restore pc from disk failed
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: GMCast version 0
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: EVS version 0
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: gcomm: connecting to group 'doghouse', peer '10.191.47.205:,10.191.47.207:,10.191.33.152:,10.191.44.142:'
      2016-02-11 18:18:56 140284274350272 [Warning] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') address 'tcp://10.191.33.152:4567' points to own listening address, blacklisting
      2016-02-11 18:18:56 140284274350272 [Note] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.191.33.150:4567 
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: declaring 00ec119f at tcp://10.191.33.150:4567 stable
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: declaring 56787688 at tcp://10.191.47.207:4567 stable
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: declaring 868cb19d at tcp://10.191.47.205:4567 stable
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: declaring a6816777 at tcp://10.191.44.142:4567 stable
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: Node 00ec119f state prim
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: view(view_id(PRIM,00ec119f,1689) memb {
      	00ec119f,0
      	56787688,0
      	868cb19d,0
      	a6816777,0
      	d2e36c1e,0
      } joined {
      } left {
      } partitioned {
      })
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: save pc into disk
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: gcomm: connected
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: Opened channel 'doghouse'
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 4, memb_num = 5
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: sent state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 0 (node3)
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 1 (shepherd)
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 2 (node1)
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 3 (bulldog)
      2016-02-11 18:18:57 140284274350272 [Note] WSREP: Waiting for SST to complete.
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: STATE EXCHANGE: got state msg: d31dae25-d115-11e5-8df3-669a3b27b7ac from 4 (node2)
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: Quorum results:
      	version    = 3,
      	component  = PRIMARY,
      	conf_id    = 142,
      	members    = 4/5 (joined/total),
      	act_id     = 108088703,
      	last_appl. = -1,
      	protocols  = 0/7/3 (gcs/repl/appl),
      	group UUID = b36c4dd2-7dbc-11e5-857f-c70794853074
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: Flow-control interval: [36, 36]
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 108088703)
      2016-02-11 18:18:57 140284274091776 [Note] WSREP: State transfer required: 
      	Group state: b36c4dd2-7dbc-11e5-857f-c70794853074:108088703
      	Local state: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812
      2016-02-11 18:18:57 140284274091776 [Note] WSREP: New cluster view: global state: b36c4dd2-7dbc-11e5-857f-c70794853074:108088703, view# 143: Primary, number of nodes: 5, my index: 4, protocol version 3
      2016-02-11 18:18:57 140284274091776 [Warning] WSREP: Gap in state sequence. Need state transfer.
      2016-02-11 18:18:57 140175619389184 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.191.33.152' --datadir '/data/mariadb/'   --parent '64419'  '' '
      2016-02-11 18:18:57 140284274091776 [Note] WSREP: Prepared SST request: rsync|10.191.33.152:4444/rsync_sst
      2016-02-11 18:18:57 140284274091776 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2016-02-11 18:18:57 140175711663872 [Note] WSREP: Service thread queue flushed.
      2016-02-11 18:18:57 140284274091776 [Note] WSREP: Assign initial position for certification: 108088703, protocol version: 3
      2016-02-11 18:18:57 140175711663872 [Note] WSREP: Service thread queue flushed.
      2016-02-11 18:18:57 140284274091776 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.191.33.152:4568
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: Member 4.0 (node2) requested state transfer from 'node1'. Selected 0.0 (node3)(SYNCED) as donor.
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 108088707)
      2016-02-11 18:18:57 140284274091776 [Note] WSREP: Requesting state transfer: success, donor: 0
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: 0.0 (node3): State transfer to 4.0 (node2) complete.
      2016-02-11 18:18:57 140175648745216 [Note] WSREP: Member 0.0 (node3) synced with group.
      WSREP_SST: [INFO] Joiner cleanup. rsync PID: 64459 (20160211 18:18:58.370)
      WSREP_SST: [INFO] Joiner cleanup done. (20160211 18:18:58.875)
      2016-02-11 18:18:58 140284274350272 [Note] WSREP: SST complete, seqno: 108007812
      2016-02-11 18:18:58 140284274350272 [Note] Using unique option prefix 'myisam_recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.
      2016-02-11 18:18:58 7f967a5718c0 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
      2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-02-11 18:18:58 140284274350272 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Memory barrier is not used
      2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Using SSE crc32 instructions
      2016-02-11 18:18:58 140284274350272 [Note] InnoDB: Initializing buffer pool, size = 640.0G
      2016-02-11 18:19:00 140175657137920 [Note] WSREP: (d2e36c1e, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-02-11 18:19:21 140284274350272 [Note] InnoDB: Completed initialization of buffer pool
      2016-02-11 18:19:24 140284274350272 [Note] InnoDB: Highest supported file format is Barracuda.
      2016-02-11 18:19:31 140284274350272 [Note] InnoDB: 128 rollback segment(s) are active.
      2016-02-11 18:19:31 140284274350272 [Note] InnoDB: Waiting for purge to start
      2016-02-11 18:19:31 140284274350272 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-76.0 started; log sequence number 2415280906695
      2016-02-11 18:19:31 139421483501312 [Note] InnoDB: Dumping buffer pool(s) not yet started
      2016-02-11 18:19:31 140284274350272 [Note] Server socket created on IP: '::'.
      2016-02-11 18:19:31 140284274350272 [Note] WSREP: Signalling provider to continue.
      2016-02-11 18:19:31 140284274350272 [Note] WSREP: SST received: b36c4dd2-7dbc-11e5-857f-c70794853074:108007812
      2016-02-11 18:19:31 140284274091776 [Note] WSREP: Receiving IST: 80891 writesets, seqnos 108007812-108088703
      2016-02-11 18:19:31 140284273592064 [Note] Event Scheduler: scheduler thread started with id 4
      2016-02-11 18:19:31 140284274350272 [Note] /opt/mariadb/bin/mysqld: ready for connections.
      Version: '10.1.11-MariaDB-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
      2016-02-11 18:19:31 140175716165376 [Warning] IP address '172.17.0.3' could not be resolved: Name or service not known
      2016-02-11 18:19:50 140284274091776 [Note] WSREP: IST received: b36c4dd2-7dbc-11e5-857f-c70794853074:108088703
      2016-02-11 18:19:51 140175648745216 [Note] WSREP: 4.0 (node2): State transfer from 0.0 (node3) complete.
      2016-02-11 18:19:51 140175648745216 [Note] WSREP: Shifting JOINER -> JOINED (TO: 108090463)
      2016-02-11 18:19:51 140175648745216 [Note] WSREP: Member 4.0 (node2) synced with group.
      2016-02-11 18:19:51 140175648745216 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 108090482)
      2016-02-11 18:19:51 140175718414080 [Note] WSREP: Synchronized with group, ready for connections

      Attachments

        Issue Links

          Activity

            People

              jplindst Jan Lindström (Inactive)
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.