Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Not a Bug
-
10.3.5
-
CentOS 7.4.
Description
mariadb.service entered failed mode after power down/up event
galera cluster was powered down /up after power supply problem
mysql process is running but cannot login to mysql shell and it's found that
mariadb.service was entered failed mode
note: cluster was in synced state before the power down
# ps aux | grep -v grep | grep mysql
|
mysql 965 0.0 2.2 629204 46616 ? Ssl 02:08 0:40 /usr/sbin/mysqld --wsrep_start_position=dff6e041-1005-11e8-85c9-965f304f37bc:131668
|
|
# systemctl status mariadb.service
|
● mariadb.service - MariaDB 10.3.5 database server
|
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
|
Drop-In: /etc/systemd/system/mariadb.service.d
|
└─migrated-from-my.cnf-settings.conf
|
Active: failed (Result: timeout) since Tue 2018-03-27 02:11:48 EEST; 15h ago
|
Docs: man:mysqld(8)
|
https://mariadb.com/kb/en/library/systemd/
|
Process: 867 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
|
Process: 860 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
|
Main PID: 965
|
CGroup: /system.slice/mariadb.service
|
└─965 /usr/sbin/mysqld --wsrep_start_position=dff6e041-1005-11e8-85c9-965f304f37bc:131668
|
|
Mar 27 02:08:44 localhost.localdomain systemd[1]: Starting MariaDB 10.3.5 database server...
|
Mar 27 02:08:48 localhost.localdomain sh[867]: WSREP: Recovered position dff6e041-1005-11e8-85c9-965f304f37bc:131668
|
Mar 27 02:08:48 localhost.localdomain mysqld[965]: 2018-03-27 2:08:48 0 [Note] /usr/sbin/mysqld (mysqld 10.3.5-MariaDB) starting as process 965 ...
|
Mar 27 02:10:18 t4w5.xentio.lan systemd[1]: mariadb.service start operation timed out. Terminating.
|
Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
|
Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: Failed to start MariaDB 10.3.5 database server.
|
Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: Unit mariadb.service entered failed state.
|
Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: mariadb.service failed.
|
|
Cluster consists of 4 nodes: 1 - 192.168.104.191, 2 - 192.168.104.193, 3 - 192.168.104.195, 4 - 192.168.104.196:
Node 3 ( 192.168.104.195 ) after non graceful restart comes alive, preservers gvwstate.dat. At this point it will try to connect to other nodes. It is only successful to connecting to node 4 ( 192.168.104.196 ).
Since both of these 2 nodes are not responsible to create new PRIMARY component, they wait for other nodes to join.
It is required that either all of the members of the previous primary component appear online or otherwise the wait times out.
This is expected behavior.