[MDEV-10004] Galera's pc.recovery process fails in 10.1 with systemd Created: 2016-04-27  Updated: 2020-08-25  Resolved: 2016-05-27

Status: Closed
Project: MariaDB Server
Component/s: Galera, wsrep
Affects Version/s: 10.1.13
Fix Version/s: 10.1.15

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Nirbhay Choubey (Inactive)
Resolution: Fixed Votes: 1
Labels: galera, systemd, wsrep

Attachments: File node1.err     File node1_centos6_success.err     File node2.err     File node2_centos6_success.err    
Issue Links:
Duplicate
duplicates MDEV-10243 Systemd does not perform --wsrep-reco... Closed
duplicates MDEV-10415 Create galera_recovery_information sc... Closed
Relates
relates to MDEV-14707 systemd: remove PermissionsStartOnly=... Open
Sprint: 10.2.1-3, 10.2.1-4

 Description   

Galera's pc.recovery process allows a cluster to automatically recover after a crash without bootstrapping.

When I try to test this recovery process in MariaDB 10.1 on CentOS/RHEL 7, automatic recovery always fails with a vague "Operation not permitted" error.

To reproduce, let's say that we have a two node cluster.

First bootstrap the first node:

sudo galera_new_cluster

Then start mysqld on the second node,

sudo systemctl start mariadb

Now to simulate a crash, let's kill mysqld on both nodes:

sudo kill -9 `pidof mysqld`

Now let's verify that both grastate.dat and gvwstate.dat have meaningful information:

$ sudo cat /var/lib/mysql/gvwstate.dat
my_uuid: 4384a074-0cb7-11e6-af78-070fdb4f5393
#vwbeg
view_id: 3 4384a074-0cb7-11e6-af78-070fdb4f5393 2
bootstrap: 0
member: 4384a074-0cb7-11e6-af78-070fdb4f5393 0
member: 4856263a-0cb7-11e6-a78f-0787aa9d1a09 0
#vwend
$ sudo cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    43850bb2-0cb7-11e6-a4f3-76531c9a59eb
seqno:   -1
cert_index:

Now, start mysqld both nodes normally. We are not bootstrapping the first node here because we would like automatic recovery to take place:

sudo systemctl start mariadb

When this happens, you will likely see that the saved state is initially restored:

2016-04-27 16:34:55 140359035238528 [Note] WSREP: restore pc from disk successfully
2016-04-27 16:34:55 140359035238528 [Note] WSREP: GMCast version 0
2016-04-27 16:34:55 140359035238528 [Note] WSREP: (4384a074, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-04-27 16:34:55 140359035238528 [Note] WSREP: (4384a074, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-04-27 16:34:55 140359035238528 [Note] WSREP: EVS version 0
2016-04-27 16:34:55 140359035238528 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.31.22.174:'
2016-04-27 16:34:55 140359035238528 [Note] WSREP: (4384a074, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2016-04-27 16:34:56 140359035238528 [Note] WSREP: declaring 4856263a at tcp://172.31.22.174:4567 stable
2016-04-27 16:34:56 140359035238528 [Warning] WSREP: no nodes coming from prim view, prim not possible
2016-04-27 16:34:56 140359035238528 [Note] WSREP: view(view_id(NON_PRIM,4384a074,4) memb {
	4384a074,0
	4856263a,0
} joined {
} left {
} partitioned {
})
2016-04-27 16:34:56 140359035238528 [Note] WSREP: promote to primary component
2016-04-27 16:34:56 140359035238528 [Note] WSREP: view(view_id(PRIM,4384a074,4) memb {
	4384a074,0
	4856263a,0
} joined {
} left {
} partitioned {
})
2016-04-27 16:34:56 140359035238528 [Note] WSREP: save pc into disk

But eventually, mysqld will abort when it supposedly attempts to SST, but fails:

2016-04-27 16:34:56 140359035238528 [Note] WSREP: Waiting for SST to complete.
2016-04-27 16:34:56 140358755280640 [Note] WSREP: STATE EXCHANGE: sent state msg: 8104c0a0-0cb7-11e6-8d89-32b3ced6c311
2016-04-27 16:34:56 140358755280640 [Note] WSREP: STATE EXCHANGE: got state msg: 8104c0a0-0cb7-11e6-8d89-32b3ced6c311 from 0 ()
2016-04-27 16:34:56 140358755280640 [Note] WSREP: STATE EXCHANGE: got state msg: 8104c0a0-0cb7-11e6-8d89-32b3ced6c311 from 1 ()
2016-04-27 16:34:56 140358755280640 [Warning] WSREP: Quorum: No node with complete state:
 
 
	Version      : 3
	Flags        : 0x5
	Protocols    : 0 / 7 / 3
	State        : NON-PRIMARY
	Prim state   : NON-PRIMARY
	Prim UUID    : 00000000-0000-0000-0000-000000000000
	Prim  seqno  : -1
	First seqno  : -1
	Last  seqno  : -1
	Prim JOINED  : 0
	State UUID   : 8104c0a0-0cb7-11e6-8d89-32b3ced6c311
	Group UUID   : 00000000-0000-0000-0000-000000000000
	Name         : ''
	Incoming addr: '172.31.19.192:3306'
 
	Version      : 3
	Flags        : 0x4
	Protocols    : 0 / 7 / 3
	State        : NON-PRIMARY
	Prim state   : NON-PRIMARY
	Prim UUID    : 00000000-0000-0000-0000-000000000000
	Prim  seqno  : -1
	First seqno  : -1
	Last  seqno  : -1
	Prim JOINED  : 0
	State UUID   : 8104c0a0-0cb7-11e6-8d89-32b3ced6c311
	Group UUID   : 00000000-0000-0000-0000-000000000000
	Name         : ''
	Incoming addr: '172.31.22.174:3306'
 
2016-04-27 16:34:56 140358755280640 [Warning] WSREP: No re-merged primary component found.
2016-04-27 16:34:56 140358755280640 [Note] WSREP: Bootstrapped primary 00000000-0000-0000-0000-000000000000 found: 2.
2016-04-27 16:34:56 140358755280640 [Note] WSREP: Quorum results:
	version    = 3,
	component  = PRIMARY,
	conf_id    = -1,
	members    = 2/2 (joined/total),
	act_id     = -1,
	last_appl. = -1,
	protocols  = 0/7/3 (gcs/repl/appl),
	group UUID = 00000000-0000-0000-0000-000000000000
2016-04-27 16:34:56 140358755280640 [Note] WSREP: Flow-control interval: [23, 23]
2016-04-27 16:34:56 140358755280640 [Note] WSREP: Restored state OPEN -> JOINED (-1)
2016-04-27 16:34:56 140359034923776 [Note] WSREP: New cluster view: global state: :-1, view# 0: Primary, number of nodes: 2, my index: 0, protocol version 3
2016-04-27 16:34:56 140359035238528 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2016-04-27 16:34:56 140359035238528 [ERROR] Aborting
 
Error in my_thread_global_end(): 1 threads didn't exit

I suspect the failure might be caused because Group UUID is 00000000-0000-0000-0000-000000000000, even though grastate.dat and gvwstate.dat both seem to have valid values. It seems as though the server is ignoring the valid Group UUID, or it is not being transmitted or received properly.

When the server SSTs during a normal startup in which automatic recovery is not attempted, everything works fine. This only seems to happen during recovery.

The configuration files for these nodes look like this:

[mariadb-10.1]
log_bin=mariadb-bin
binlog_format=row
gtid_domain_id=1
server_id=1
gtid_strict_mode=ON
log_error=mysqld.err
log_slave_updates
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://172.31.22.174"
wsrep_gtid_domain_id=3
wsrep_gtid_mode=ON
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
query_cache_size=0
wsrep_sst_method=rsync

The galera provider being used is 25.3.15.



 Comments   
Comment by Geoff Montee (Inactive) [ 2016-04-27 ]

Note that automatic recovery seems to work fine on MariaDB Galera Cluster 10.0 on CentOS/RHEL 6, so this problem might be specific to MariaDB 10.1 or systemd.

Comment by Geoff Montee (Inactive) [ 2016-04-28 ]

Automatic recovery also seems to work fine on MariaDB 10.1 on CentOS/RHEL 6, so this problem might be specific to systemd or something else on CentOS/RHEL 7.

To see it working on CentOS 6, I did the following:

Bootstrap the first node:

sudo service mysql bootstrap

Then start the second node:

sudo service mysql start

Then to simulate a crash, kill mysqld on both nodes:

sudo kill -9 `pidof mysqld`

Then verify that both grastate.dat and gvwstate.dat have meaningful information (I'll attach the full logs as node1_centos6_success.err and node2_centos6_success.err):

$ sudo cat /var/lib/mysql/gvwstate.dat
my_uuid: 26ea6165-0d6e-11e6-8432-6eb9fb95b481
#vwbeg
view_id: 3 1fb17fbb-0d6e-11e6-b23f-9e1fe9aa2a3f 2
bootstrap: 0
member: 1fb17fbb-0d6e-11e6-b23f-9e1fe9aa2a3f 0
member: 26ea6165-0d6e-11e6-8432-6eb9fb95b481 0
#vwend
$ sudo cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    1fb23f12-0d6e-11e6-868f-3a082830dc1b
seqno:   -1
cert_index:

Then start mysqld both nodes normally. Again, we are not bootstrapping the first node here because we would like automatic recovery to take place:

sudo service mysql start

Here's an example of the relevant log section:

2016-04-28 14:26:37 140589529954336 [Note] WSREP: restore pc from disk successfully
2016-04-28 14:26:37 140589529954336 [Note] WSREP: GMCast version 0
2016-04-28 14:26:37 140589529954336 [Note] WSREP: (26ea6165, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-04-28 14:26:37 140589529954336 [Note] WSREP: (26ea6165, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-04-28 14:26:37 140589529954336 [Note] WSREP: EVS version 0
2016-04-28 14:26:37 140589529954336 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.31.17.215:,172.31.31.63:'
2016-04-28 14:26:37 140589529954336 [Warning] WSREP: (26ea6165, 'tcp://0.0.0.0:4567') address 'tcp://172.31.31.63:4567' points to own listening address, blacklisting
2016-04-28 14:26:37 140589529954336 [Note] WSREP: (26ea6165, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2016-04-28 14:26:38 140589529954336 [Note] WSREP: declaring 1fb17fbb at tcp://172.31.17.215:4567 stable
2016-04-28 14:26:38 140589529954336 [Warning] WSREP: no nodes coming from prim view, prim not possible
2016-04-28 14:26:38 140589529954336 [Note] WSREP: view(view_id(NON_PRIM,1fb17fbb,4) memb {
	1fb17fbb,0
	26ea6165,0
} joined {
} left {
} partitioned {
})
2016-04-28 14:26:38 140589529954336 [Note] WSREP: promote to primary component
2016-04-28 14:26:38 140589529954336 [Note] WSREP: view(view_id(PRIM,1fb17fbb,4) memb {
	1fb17fbb,0
	26ea6165,0
} joined {
} left {
} partitioned {
})
2016-04-28 14:26:38 140589529954336 [Note] WSREP: save pc into disk
2016-04-28 14:26:38 140589529954336 [Note] WSREP: clear restored view
2016-04-28 14:26:38 140589529954336 [Warning] WSREP: non weight changing install in S_PRIM: pcmsg{ type=INSTALL, seq=0, flags= 2, node_map {	1fb17fbb,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,1fb17fbb,2),to_seq=-1,weight=1,segment=0
	26ea6165,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,1fb17fbb,2),to_seq=-1,weight=1,segment=0
}}
2016-04-28 14:26:38 140589529954336 [Note] WSREP: gcomm: connected
2016-04-28 14:26:38 140589529954336 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-04-28 14:26:38 140589529954336 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-04-28 14:26:38 140589529954336 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2016-04-28 14:26:38 140589251155712 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Flow-control interval: [23, 23]
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Received NON-PRIMARY.
2016-04-28 14:26:38 140589251155712 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 1, memb_num = 2
2016-04-28 14:26:38 140589251155712 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-04-28 14:26:38 140589529954336 [Note] WSREP: Waiting for SST to complete.
2016-04-28 14:26:38 140589240359680 [Note] WSREP: New cluster view: global state: 1fb23f12-0d6e-11e6-868f-3a082830dc1b:0, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version -1
2016-04-28 14:26:38 140589240359680 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-04-28 14:26:38 140589251155712 [Note] WSREP: STATE EXCHANGE: sent state msg: bed3928d-0d6e-11e6-ae4c-db5ed7c3c9f3
2016-04-28 14:26:38 140589251155712 [Note] WSREP: STATE EXCHANGE: got state msg: bed3928d-0d6e-11e6-ae4c-db5ed7c3c9f3 from 0 ()
2016-04-28 14:26:38 140589251155712 [Note] WSREP: STATE EXCHANGE: got state msg: bed3928d-0d6e-11e6-ae4c-db5ed7c3c9f3 from 1 ()
2016-04-28 14:26:38 140589251155712 [Warning] WSREP: Quorum: No node with complete state:
 
 
	Version      : 3
	Flags        : 0x5
	Protocols    : 0 / 7 / 3
	State        : NON-PRIMARY
	Prim state   : NON-PRIMARY
	Prim UUID    : 00000000-0000-0000-0000-000000000000
	Prim  seqno  : -1
	First seqno  : -1
	Last  seqno  : 0
	Prim JOINED  : 0
	State UUID   : bed3928d-0d6e-11e6-ae4c-db5ed7c3c9f3
	Group UUID   : 1fb23f12-0d6e-11e6-868f-3a082830dc1b
	Name         : ''
	Incoming addr: '172.31.17.215:3306'
 
	Version      : 3
	Flags        : 0x4
	Protocols    : 0 / 7 / 3
	State        : NON-PRIMARY
	Prim state   : NON-PRIMARY
	Prim UUID    : 00000000-0000-0000-0000-000000000000
	Prim  seqno  : -1
	First seqno  : -1
	Last  seqno  : 0
	Prim JOINED  : 0
	State UUID   : bed3928d-0d6e-11e6-ae4c-db5ed7c3c9f3
	Group UUID   : 1fb23f12-0d6e-11e6-868f-3a082830dc1b
	Name         : ''
	Incoming addr: '172.31.31.63:3306'
 
2016-04-28 14:26:38 140589251155712 [Warning] WSREP: No re-merged primary component found.
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Bootstrapped primary 00000000-0000-0000-0000-000000000000 found: 2.
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Quorum results:
	version    = 3,
	component  = PRIMARY,
	conf_id    = -1,
	members    = 2/2 (joined/total),
	act_id     = 0,
	last_appl. = -1,
	protocols  = 0/7/3 (gcs/repl/appl),
	group UUID = 1fb23f12-0d6e-11e6-868f-3a082830dc1b
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Flow-control interval: [23, 23]
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Restored state OPEN -> JOINED (0)
2016-04-28 14:26:38 140589240359680 [Note] WSREP: New cluster view: global state: 1fb23f12-0d6e-11e6-868f-3a082830dc1b:0, view# 0: Primary, number of nodes: 2, my index: 1, protocol version 3
2016-04-28 14:26:38 140589529954336 [Note] WSREP: SST complete, seqno: 0
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Member 0.0 () synced with group.
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Member 1.0 () synced with group.
2016-04-28 14:26:38 140589251155712 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)

Comment by Geoff Montee (Inactive) [ 2016-04-28 ]

Automatic recovery also seems to work fine in MariaDB Galera Cluster 10.0 on CentOS/RHEL 7. Since MariaDB Galera Cluster 10.0 doesn't use systemd and MariaDB 10.1 does, maybe systemd is somehow causing the failure.

Comment by Nirbhay Choubey (Inactive) [ 2016-05-04 ]

GeoffMontee Right, systemd is the culprit here. The init scripts use mysqld_safe to start mysqld. mysqld_safe
uses a 2-pass approach to start the server. In first pass it tries to recover the start position (--wsrep-recover) and
and then uses the recovered position (if any) to start mysqld (--wsrep-start-position=xxxx).
OTOH systemd starts mysqld directly (essentially obsoleting mysqld_safe).
danblack/svoj : Any suggestions?

Comment by Sergey Vojtovich [ 2016-05-05 ]

One of options would be to use ExecStartPre to generate additionoal config file with wsrep-start-position and ExecStartPost to remove it.

Comment by Daniel Black [ 2016-05-05 ]

Can the --wsrep-recover logic be moved inside the server to make the second startup with --wsrep-start-position redundant? Was there a reason for keeping these separate? If the mysqld_safe is adding wsrep-recover logic for whenever wsrep=on is there any reason for this not to be part of the server logic?

systemctl set-environment .. may also be of assistance if I've missed some important logic here.

Comment by Geoff Montee (Inactive) [ 2016-05-11 ]

It does seem that automatic recovery works on MariaDB 10.1 on CentOS/RHEL 7 if I use the old SysV init scripts instead of the systemd service. To reproduce:

First, move the systemd service file and reload on both nodes:

sudo mv /usr/lib/systemd/system/mariadb.service ./
sudo systemctl daemon-reload

Then bootstrap the first node.

Note: we cannot use galera_new_cluster here, since that relies on mariadb.service (which we just moved from its expected location). We also can't pass --wsrep_new_cluster directly to the init script, since systemd ignores extra options.

sudo tee /etc/my.cnf.d/wsrep_new_cluster.cnf <<EOF
[galera]
wsrep_new_cluster
EOF
sudo service mysql start
sudo rm /etc/my.cnf.d/wsrep_new_cluster.cnf

Then start the second node:

sudo service mysql start

Then to simulate a crash, kill mysqld on both nodes:

sudo kill -9 `pidof mysqld`

After that, we have to stop the service on both nodes for some reason, even though mysqld is already dead. systemd might keep extra state somewhere that needs to be cleared.

sudo service mysql stop

Then start mysqld on both nodes normally:

sudo service mysql start

We can see that recovery is working:

2016-05-11 15:06:18 139814266562688 [Note] WSREP: restore pc from disk successfully
2016-05-11 15:06:18 139814266562688 [Note] WSREP: GMCast version 0
2016-05-11 15:06:18 139814266562688 [Note] WSREP: (2a333154, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-05-11 15:06:18 139814266562688 [Note] WSREP: (2a333154, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-05-11 15:06:18 139814266562688 [Note] WSREP: EVS version 0
2016-05-11 15:06:18 139814266562688 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.31.22.174:'
2016-05-11 15:06:18 139814266562688 [Note] WSREP: (2a333154, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2016-05-11 15:06:18 139814266562688 [Note] WSREP: declaring 52cf4544 at tcp://172.31.22.174:4567 stable
2016-05-11 15:06:18 139814266562688 [Warning] WSREP: no nodes coming from prim view, prim not possible
2016-05-11 15:06:18 139814266562688 [Note] WSREP: view(view_id(NON_PRIM,2a333154,10) memb {
        2a333154,0
        52cf4544,0
} joined {
} left {
} partitioned {
})
2016-05-11 15:06:18 139814266562688 [Note] WSREP: promote to primary component
2016-05-11 15:06:18 139814266562688 [Note] WSREP: view(view_id(PRIM,2a333154,10) memb {
        2a333154,0
        52cf4544,0
} joined {
} left {
} partitioned {
})
2016-05-11 15:06:18 139814266562688 [Note] WSREP: save pc into disk
2016-05-11 15:06:18 139814266562688 [Note] WSREP: clear restored view
2016-05-11 15:06:18 139814266562688 [Warning] WSREP: non weight changing install in S_PRIM: pcmsg{ type=INSTALL, seq=0, flags= 2, node_map {    2a333154,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,2a333154,8),to_seq=-1,weight=1,segment=0
        52cf4544,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,2a333154,8),to_seq=-1,weight=1,segment=0
}}
2016-05-11 15:06:18 139814266562688 [Note] WSREP: gcomm: connected
2016-05-11 15:06:18 139814266562688 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-05-11 15:06:18 139814266562688 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-05-11 15:06:18 139814266562688 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2016-05-11 15:06:18 139814266562688 [Note] WSREP: Waiting for SST to complete.
2016-05-11 15:06:18 139813986494208 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Flow-control interval: [23, 23]
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Received NON-PRIMARY.
2016-05-11 15:06:18 139813986494208 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 0, memb_num = 2
2016-05-11 15:06:18 139813986494208 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 7120d9e4-17ab-11e6-ad85-4772c8eebd50
2016-05-11 15:06:18 139814266247936 [Note] WSREP: New cluster view: global state: 35e6b60c-17a7-11e6-8f56-de071bb8146b:0, view# -1: non-Primary, number of nodes: 2, my index: 0, protocol version -1
2016-05-11 15:06:18 139814266247936 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-05-11 15:06:18 139813986494208 [Note] WSREP: STATE EXCHANGE: sent state msg: 7120d9e4-17ab-11e6-ad85-4772c8eebd50
2016-05-11 15:06:18 139813986494208 [Note] WSREP: STATE EXCHANGE: got state msg: 7120d9e4-17ab-11e6-ad85-4772c8eebd50 from 0 ()
2016-05-11 15:06:18 139813986494208 [Note] WSREP: STATE EXCHANGE: got state msg: 7120d9e4-17ab-11e6-ad85-4772c8eebd50 from 1 ()
2016-05-11 15:06:18 139813986494208 [Warning] WSREP: Quorum: No node with complete state:
 
 
        Version      : 3
        Flags        : 0x5
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Prim state   : NON-PRIMARY
        Prim UUID    : 00000000-0000-0000-0000-000000000000
        Prim  seqno  : -1
        First seqno  : -1
        Last  seqno  : 0
        Prim JOINED  : 0
        State UUID   : 7120d9e4-17ab-11e6-ad85-4772c8eebd50
        Group UUID   : 35e6b60c-17a7-11e6-8f56-de071bb8146b
        Name         : ''
        Incoming addr: '172.31.19.192:3306'
 
        Version      : 3
        Flags        : 0x4
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Prim state   : NON-PRIMARY
        Prim UUID    : 00000000-0000-0000-0000-000000000000
        Prim  seqno  : -1
        First seqno  : -1
        Last  seqno  : 0
        Prim JOINED  : 0
        State UUID   : 7120d9e4-17ab-11e6-ad85-4772c8eebd50
        Group UUID   : 35e6b60c-17a7-11e6-8f56-de071bb8146b
        Name         : ''
        Incoming addr: '172.31.22.174:3306'
 
2016-05-11 15:06:18 139813986494208 [Warning] WSREP: No re-merged primary component found.
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Bootstrapped primary 00000000-0000-0000-0000-000000000000 found: 2.
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = -1,
        members    = 2/2 (joined/total),
        act_id     = 0,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 35e6b60c-17a7-11e6-8f56-de071bb8146b
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Flow-control interval: [23, 23]
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Restored state OPEN -> JOINED (0)
2016-05-11 15:06:18 139814266247936 [Note] WSREP: New cluster view: global state: 35e6b60c-17a7-11e6-8f56-de071bb8146b:0, view# 0: Primary, number of nodes: 2, my index: 0, protocol version 3
2016-05-11 15:06:18 139814266562688 [Note] WSREP: SST complete, seqno: 0
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Member 1.0 () synced with group.
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Member 0.0 () synced with group.
2016-05-11 15:06:18 139813986494208 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)

Comment by Nirbhay Choubey (Inactive) [ 2016-05-21 ]

svoj danblack Would you be interested in reviewing the patch?
http://lists.askmonty.org/pipermail/commits/2016-May/009368.html

Comment by Nirbhay Choubey (Inactive) [ 2016-05-21 ]

Can the --wsrep-recover logic be moved inside the server to make the second startup with --wsrep-start-position redundant? Was there a reason for keeping these separate? If the mysqld_safe is adding wsrep-recover logic for whenever wsrep=on is there any reason for this not to be part of the server logic?

The rsync/xtrabackup based SST requires file transfer to happen before the SE initialization.

Comment by Nirbhay Choubey (Inactive) [ 2016-05-27 ]

http://lists.askmonty.org/pipermail/commits/2016-May/009384.html

Generated at Thu Feb 08 07:38:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.