Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.3.6, 10.1(EOL), 10.2(EOL)
-
None
-
CentOS 7.4; Galera 2 Nodes
Description
Cannot resume Node SYNCED state when wsrep_desync is done after FTWRL
1. Issue ' flush tables with read lock;' on Galera Node
MariaDB [(none)]> flush tables with read lock;
Query OK, 0 rows affected (0.000 sec)
2. Set wsrep_desync – Command hangs and wsrep_desync=1 is not applied ,
but afterwards cannot resume SYNCED state
MariaDB [(none)]> set global wsrep_desync=1;
^CCtrl-C – query killed. Continuing normally.
^CCtrl-C – query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [(none)]> show processlist ;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 81
Current database: *** NONE ***
------------------------------------------------------------------------------------------------
Id | User | Host | db | Command | Time | State | Info | Progress |
------------------------------------------------------------------------------------------------
1 | system user | NULL | Sleep | 3331 | WSREP aborter idle | NULL | 0.000 | |
2 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
4 | system user | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 | |
5 | system user | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | |
3 | system user | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | |
6 | system user | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 | |
7 | system user | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 | |
9 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
8 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
10 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
26 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
17 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
16 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
19 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
20 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
22 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
23 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
18 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
24 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
21 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
14 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
25 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
35 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
27 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
13 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
34 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
11 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
30 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
37 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
15 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
33 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
32 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
31 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
12 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
55 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
38 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
51 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
42 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
43 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
36 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
41 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
45 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
44 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
39 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
48 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
46 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
47 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
50 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
52 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
53 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
40 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
28 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
49 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
29 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
58 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
59 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
61 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
56 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
60 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
62 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
57 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
63 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
64 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
54 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
65 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
69 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
66 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
67 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
70 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
68 | system user | NULL | Sleep | 3331 | NULL | NULL | 0.000 | |
78 | root | localhost | NULL | Killed | 30 | NULL | set global wsrep_desync=1 | 0.000 |
81 | root | localhost | NULL | Query | 0 | Init | show processlist | 0.000 |
------------------------------------------------------------------------------------------------
72 rows in set (0.001 sec)
Node moved in DESYNCED state and cannot be recovered
2018-04-19 17:34:17 0 [Note] WSREP: Member 0.0 (t4w6) desyncs itself from group
2018-04-19 17:34:17 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 2)
2018-04-19 17:34:17 78 [Note] WSREP: Provider paused at 9f23373a-3e6d-11e8-b625-0646494b946c:2 (31)
MariaDB [(none)]> unlock tables ;
Query OK, 0 rows affected (0.000 sec)
MariaDB [(none)]> show variables like '%wsrep_desync%' ;
--------------------+
Variable_name | Value |
--------------------+
wsrep_desync | OFF |
--------------------+
1 row in set (0.002 sec)
MariaDB [(none)]> show status like 'wsrep_local_state_comment' ;
-----------------------------------------+
Variable_name | Value |
-----------------------------------------+
wsrep_local_state_comment | Donor/Desynced |
-----------------------------------------+
1 row in set (0.001 sec)
Attachments
Issue Links
- relates to
-
MDEV-15067 Node fails to rejoin mdb cluster after recovering with mariabackup with option --galera-info
-
- Closed
-
In addition fail restart mariadb.service
# systemctl restart mariadb.service
Job for mariadb.service failed because a timeout was exceeded. See "systemctl status mariadb.service" and "journalctl -xe" for details.
[root@t4w6 ~]# mysql -u root -p1
^C
[root@t4w6 ~]# mysql -u root -p1
^C
[root@t4w6 ~]# systemctl status mariadb.service
● mariadb.service - MariaDB 10.3.5 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: failed (Result: timeout) since Thu 2018-04-19 18:30:51 EEST; 48s ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 1142 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 1389 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=killed, signal=TERM)
Process: 1387 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 1043
CGroup: /system.slice/mariadb.service
└─1043 /usr/sbin/mysqld --wsrep_start_position=9f23373a-3e6d-11e8-b625-0646494b946c:2
Apr 19 18:29:21 t4w6.xentio.lan systemd[1]: Starting MariaDB 10.3.5 database server...
Apr 19 18:30:51 t4w6.xentio.lan systemd[1]: mariadb.service start-pre operation timed out. Terminating.
Apr 19 18:30:51 t4w6.xentio.lan systemd[1]: Failed to start MariaDB 10.3.5 database server.
Apr 19 18:30:51 t4w6.xentio.lan systemd[1]: Unit mariadb.service entered failed state.
Apr 19 18:30:51 t4w6.xentio.lan systemd[1]: mariadb.service failed.
2018-04-19 18:26:20 0 [Note] /usr/sbin/mysqld (unknown): Normal shutdown
2018-04-19 18:26:20 0 [Note] WSREP: Stop replication
2018-04-19 18:26:20 0 [Note] WSREP: Closing send monitor...
2018-04-19 18:26:20 0 [Note] WSREP: Closed send monitor.
2018-04-19 18:26:20 0 [Note] WSREP: gcomm: terminating thread
2018-04-19 18:26:20 0 [Note] WSREP: gcomm: joining thread
2018-04-19 18:26:20 0 [Note] WSREP: gcomm: closing backend
2018-04-19 18:26:24 0 [Note] WSREP: (3c3e4c49, 'tcp://0.0.0.0:4567') connection to peer 6a5bf34e with addr tcp://192.168.104.195:4567 timed out, no messages seen in PT3S
2018-04-19 18:26:24 0 [Note] WSREP: (3c3e4c49, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.104.195:4567
2018-04-19 18:26:25 0 [Note] WSREP: (3c3e4c49, 'tcp://0.0.0.0:4567') reconnecting to 6a5bf34e (tcp://192.168.104.195:4567), attempt 0
2018-04-19 18:26:26 0 [Note] WSREP: evs::proto(3c3e4c49, LEAVING, view_id(REG,3c3e4c49,154)) suspecting node: 6a5bf34e
2018-04-19 18:26:26 0 [Note] WSREP: evs::proto(3c3e4c49, LEAVING, view_id(REG,3c3e4c49,154)) suspected node without join message, declaring inactive
2018-04-19 18:26:26 0 [Note] WSREP: view(view_id(NON_PRIM,3c3e4c49,154) memb {
3c3e4c49,0
} joined {
} left {
} partitioned {
6a5bf34e,0
})
2018-04-19 18:26:26 0 [Note] WSREP: view((empty))
2018-04-19 18:26:26 0 [Note] WSREP: gcomm: closed
2018-04-19 18:26:26 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-04-19 18:26:26 0 [Note] WSREP: Flow-control interval: [16, 16]
2018-04-19 18:26:26 0 [Note] WSREP: Trying to continue unpaused monitor
2018-04-19 18:26:26 0 [Note] WSREP: Received NON-PRIMARY.
2018-04-19 18:26:26 0 [Note] WSREP: Shifting DONOR/DESYNCED -> OPEN (TO: 2)
2018-04-19 18:26:26 0 [Note] WSREP: Received self-leave message.
2018-04-19 18:26:26 0 [Note] WSREP: Flow-control interval: [0, 0]
2018-04-19 18:26:26 0 [Note] WSREP: Trying to continue unpaused monitor
2018-04-19 18:26:26 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2018-04-19 18:26:26 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2)
2018-04-19 18:26:26 0 [Note] WSREP: RECV thread exiting 0: Success
2018-04-19 18:26:26 0 [Note] WSREP: recv_thread() joined.
2018-04-19 18:26:26 0 [Note] WSREP: Closing replication queue.
2018-04-19 18:26:26 0 [Note] WSREP: Closing slave action queue.
2018-04-19 18:26:28 0 [Note] WSREP: killing local connection: 78