Cannot resume Node SYNCED state when wsrep_desync is done after FTWRL
1. Issue ' flush tables with read lock;' on Galera Node
MariaDB [(none)]> flush tables with read lock;
Query OK, 0 rows affected (0.000 sec)
2. Set wsrep_desync – Command hangs and wsrep_desync=1 is not applied ,
but afterwards cannot resume SYNCED state
MariaDB [(none)]> set global wsrep_desync=1;
^CCtrl-C – query killed. Continuing normally.
^CCtrl-C – query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [(none)]> show processlist ;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 81
Current database: *** NONE ***
------------------------------------------------------------------------------------------------
Id |
User |
Host |
db |
Command |
Time |
State |
Info |
Progress |
------------------------------------------------------------------------------------------------
1 |
system user |
|
NULL |
Sleep |
3331 |
WSREP aborter idle |
NULL |
0.000 |
2 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
4 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge coordinator |
NULL |
0.000 |
5 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
3 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
6 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
7 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB shutdown handler |
NULL |
0.000 |
9 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
8 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
10 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
26 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
17 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
16 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
19 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
20 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
22 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
23 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
18 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
24 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
21 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
14 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
25 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
35 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
27 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
13 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
34 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
11 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
30 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
37 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
15 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
33 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
32 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
31 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
12 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
55 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
38 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
51 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
42 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
43 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
36 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
41 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
45 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
44 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
39 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
48 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
46 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
47 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
50 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
52 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
53 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
40 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
28 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
49 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
29 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
58 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
59 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
61 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
56 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
60 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
62 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
57 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
63 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
64 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
54 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
65 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
69 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
66 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
67 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
70 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
68 |
system user |
|
NULL |
Sleep |
3331 |
NULL |
NULL |
0.000 |
78 |
root |
localhost |
NULL |
Killed |
30 |
NULL |
set global wsrep_desync=1 |
0.000 |
81 |
root |
localhost |
NULL |
Query |
0 |
Init |
show processlist |
0.000 |
------------------------------------------------------------------------------------------------
72 rows in set (0.001 sec)
Node moved in DESYNCED state and cannot be recovered
2018-04-19 17:34:17 0 [Note] WSREP: Member 0.0 (t4w6) desyncs itself from group
2018-04-19 17:34:17 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 2)
2018-04-19 17:34:17 78 [Note] WSREP: Provider paused at 9f23373a-3e6d-11e8-b625-0646494b946c:2 (31)
MariaDB [(none)]> unlock tables ;
Query OK, 0 rows affected (0.000 sec)
MariaDB [(none)]> show variables like '%wsrep_desync%' ;
--------------------+
--------------------+
--------------------+
1 row in set (0.002 sec)
MariaDB [(none)]> show status like 'wsrep_local_state_comment' ;
-----------------------------------------+
-----------------------------------------+
wsrep_local_state_comment |
Donor/Desynced |
-----------------------------------------+
1 row in set (0.001 sec)
In addition fail restart mariadb.service
# systemctl restart mariadb.service
Job for mariadb.service failed because a timeout was exceeded. See "systemctl status mariadb.service" and "journalctl -xe" for details.
[root@t4w6 ~]# mysql -u root -p1
^C
[root@t4w6 ~]# mysql -u root -p1
^C
[root@t4w6 ~]# systemctl status mariadb.service
● mariadb.service - MariaDB 10.3.5 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: failed (Result: timeout) since Thu 2018-04-19 18:30:51 EEST; 48s ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 1142 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 1389 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=killed, signal=TERM)
Process: 1387 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 1043
CGroup: /system.slice/mariadb.service
└─1043 /usr/sbin/mysqld --wsrep_start_position=9f23373a-3e6d-11e8-b625-0646494b946c:2
Apr 19 18:29:21 t4w6.xentio.lan systemd[1]: Starting MariaDB 10.3.5 database server...
Apr 19 18:30:51 t4w6.xentio.lan systemd[1]: mariadb.service start-pre operation timed out. Terminating.
Apr 19 18:30:51 t4w6.xentio.lan systemd[1]: Failed to start MariaDB 10.3.5 database server.
Apr 19 18:30:51 t4w6.xentio.lan systemd[1]: Unit mariadb.service entered failed state.
Apr 19 18:30:51 t4w6.xentio.lan systemd[1]: mariadb.service failed.
2018-04-19 18:26:20 0 [Note] /usr/sbin/mysqld (unknown): Normal shutdown
2018-04-19 18:26:20 0 [Note] WSREP: Stop replication
2018-04-19 18:26:20 0 [Note] WSREP: Closing send monitor...
2018-04-19 18:26:20 0 [Note] WSREP: Closed send monitor.
2018-04-19 18:26:20 0 [Note] WSREP: gcomm: terminating thread
2018-04-19 18:26:20 0 [Note] WSREP: gcomm: joining thread
2018-04-19 18:26:20 0 [Note] WSREP: gcomm: closing backend
2018-04-19 18:26:24 0 [Note] WSREP: (3c3e4c49, 'tcp://0.0.0.0:4567') connection to peer 6a5bf34e with addr tcp://192.168.104.195:4567 timed out, no messages seen in PT3S
2018-04-19 18:26:24 0 [Note] WSREP: (3c3e4c49, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.104.195:4567
2018-04-19 18:26:25 0 [Note] WSREP: (3c3e4c49, 'tcp://0.0.0.0:4567') reconnecting to 6a5bf34e (tcp://192.168.104.195:4567), attempt 0
2018-04-19 18:26:26 0 [Note] WSREP: evs::proto(3c3e4c49, LEAVING, view_id(REG,3c3e4c49,154)) suspecting node: 6a5bf34e
2018-04-19 18:26:26 0 [Note] WSREP: evs::proto(3c3e4c49, LEAVING, view_id(REG,3c3e4c49,154)) suspected node without join message, declaring inactive
2018-04-19 18:26:26 0 [Note] WSREP: view(view_id(NON_PRIM,3c3e4c49,154) memb {
3c3e4c49,0
} joined {
} left {
} partitioned {
6a5bf34e,0
})
2018-04-19 18:26:26 0 [Note] WSREP: view((empty))
2018-04-19 18:26:26 0 [Note] WSREP: gcomm: closed
2018-04-19 18:26:26 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-04-19 18:26:26 0 [Note] WSREP: Flow-control interval: [16, 16]
2018-04-19 18:26:26 0 [Note] WSREP: Trying to continue unpaused monitor
2018-04-19 18:26:26 0 [Note] WSREP: Received NON-PRIMARY.
2018-04-19 18:26:26 0 [Note] WSREP: Shifting DONOR/DESYNCED -> OPEN (TO: 2)
2018-04-19 18:26:26 0 [Note] WSREP: Received self-leave message.
2018-04-19 18:26:26 0 [Note] WSREP: Flow-control interval: [0, 0]
2018-04-19 18:26:26 0 [Note] WSREP: Trying to continue unpaused monitor
2018-04-19 18:26:26 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2018-04-19 18:26:26 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2)
2018-04-19 18:26:26 0 [Note] WSREP: RECV thread exiting 0: Success
2018-04-19 18:26:26 0 [Note] WSREP: recv_thread() joined.
2018-04-19 18:26:26 0 [Note] WSREP: Closing replication queue.
2018-04-19 18:26:26 0 [Note] WSREP: Closing slave action queue.
2018-04-19 18:26:28 0 [Note] WSREP: killing local connection: 78