[MDEV-15691]  mariadb.service entered failed mode after power down/up event Created: 2018-03-27  Updated: 2019-12-12  Resolved: 2019-12-12

Status: Closed
Project: MariaDB Server
Component/s: Galera, Server
Affects Version/s: 10.3.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Not a Bug Votes: 0
Labels: systemd
Environment:

CentOS 7.4.


Attachments: Text File galera_logs.txt    

 Description   

mariadb.service entered failed mode after power down/up event

galera cluster was powered down /up after power supply problem
mysql process is running but cannot login to mysql shell and it's found that
mariadb.service was entered failed mode

note: cluster was in synced state before the power down

#  ps aux | grep -v grep | grep mysql
mysql      965  0.0  2.2 629204 46616 ?        Ssl  02:08   0:40 /usr/sbin/mysqld --wsrep_start_position=dff6e041-1005-11e8-85c9-965f304f37bc:131668

# systemctl status  mariadb.service
● mariadb.service - MariaDB 10.3.5 database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: failed (Result: timeout) since Tue 2018-03-27 02:11:48 EEST; 15h ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 867 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
  Process: 860 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 965
   CGroup: /system.slice/mariadb.service
           └─965 /usr/sbin/mysqld --wsrep_start_position=dff6e041-1005-11e8-85c9-965f304f37bc:131668
 
Mar 27 02:08:44 localhost.localdomain systemd[1]: Starting MariaDB 10.3.5 database server...
Mar 27 02:08:48 localhost.localdomain sh[867]: WSREP: Recovered position dff6e041-1005-11e8-85c9-965f304f37bc:131668
Mar 27 02:08:48 localhost.localdomain mysqld[965]: 2018-03-27  2:08:48 0 [Note] /usr/sbin/mysqld (mysqld 10.3.5-MariaDB) starting as process 965 ...
Mar 27 02:10:18 t4w5.xentio.lan systemd[1]: mariadb.service start operation timed out. Terminating.
Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: Failed to start MariaDB 10.3.5 database server.
Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: Unit mariadb.service entered failed state.
Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: mariadb.service failed.



 Comments   
Comment by Mario Karuza (Inactive) [ 2018-06-21 ]

Cluster consists of 4 nodes: 1 - 192.168.104.191, 2 - 192.168.104.193, 3 - 192.168.104.195, 4 - 192.168.104.196:

Node 3 ( 192.168.104.195 ) after non graceful restart comes alive, preservers gvwstate.dat. At this point it will try to connect to other nodes. It is only successful to connecting to node 4 ( 192.168.104.196 ).

Since both of these 2 nodes are not responsible to create new PRIMARY component, they wait for other nodes to join.

It is required that either all of the members of the previous primary component appear online or otherwise the wait times out.

This is expected behavior.

Comment by Zdravelina Sokolovska (Inactive) [ 2018-10-05 ]

the problem is actually that power down, power up occurred for all nodes, node1 was used also for loaded balancer and was wsrep excluded before.

Generated at Thu Feb 08 08:23:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.