[MDEV-15691] mariadb.service entered failed mode after power down/up event - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: 10.3.5
Fix Version/s: N/A
Component/s: Galera, Server
Labels:
- systemd
Environment:
CentOS 7.4.

Description

mariadb.service entered failed mode after power down/up event

galera cluster was powered down /up after power supply problem
mysql process is running but cannot login to mysql shell and it's found that
mariadb.service was entered failed mode

note: cluster was in synced state before the power down

#  ps aux | grep -v grep | grep mysql

mysql      965  0.0  2.2 629204 46616 ?        Ssl  02:08   0:40 /usr/sbin/mysqld --wsrep_start_position=dff6e041-1005-11e8-85c9-965f304f37bc:131668

# systemctl status  mariadb.service

● mariadb.service - MariaDB 10.3.5 database server

   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)

  Drop-In: /etc/systemd/system/mariadb.service.d

           └─migrated-from-my.cnf-settings.conf

   Active: failed (Result: timeout) since Tue 2018-03-27 02:11:48 EEST; 15h ago

     Docs: man:mysqld(8)

           https://mariadb.com/kb/en/library/systemd/

  Process: 867 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)

  Process: 860 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)

 Main PID: 965

   CGroup: /system.slice/mariadb.service

           └─965 /usr/sbin/mysqld --wsrep_start_position=dff6e041-1005-11e8-85c9-965f304f37bc:131668

Mar 27 02:08:44 localhost.localdomain systemd[1]: Starting MariaDB 10.3.5 database server...

Mar 27 02:08:48 localhost.localdomain sh[867]: WSREP: Recovered position dff6e041-1005-11e8-85c9-965f304f37bc:131668

Mar 27 02:08:48 localhost.localdomain mysqld[965]: 2018-03-27  2:08:48 0 [Note] /usr/sbin/mysqld (mysqld 10.3.5-MariaDB) starting as process 965 ...

Mar 27 02:10:18 t4w5.xentio.lan systemd[1]: mariadb.service start operation timed out. Terminating.

Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.

Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: Failed to start MariaDB 10.3.5 database server.

Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: Unit mariadb.service entered failed state.

Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: mariadb.service failed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

galera_logs.txt
6 kB
2018-03-27 15:38

Activity

Ascending order - Click to sort in descending order

Mario Karuza (Inactive) added a comment - 2018-06-21 11:52 - edited

Cluster consists of 4 nodes: 1 - 192.168.104.191, 2 - 192.168.104.193, 3 - 192.168.104.195, 4 - 192.168.104.196:

Node 3 ( 192.168.104.195 ) after non graceful restart comes alive, preservers gvwstate.dat. At this point it will try to connect to other nodes. It is only successful to connecting to node 4 ( 192.168.104.196 ).

Since both of these 2 nodes are not responsible to create new PRIMARY component, they wait for other nodes to join.

It is required that either all of the members of the previous primary component appear online or otherwise the wait times out.

This is expected behavior.

Mario Karuza (Inactive) added a comment - 2018-06-21 11:52 - edited Cluster consists of 4 nodes: 1 - 192.168.104.191, 2 - 192.168.104.193, 3 - 192.168.104.195, 4 - 192.168.104.196: Node 3 ( 192.168.104.195 ) after non graceful restart comes alive, preservers gvwstate.dat. At this point it will try to connect to other nodes. It is only successful to connecting to node 4 ( 192.168.104.196 ). Since both of these 2 nodes are not responsible to create new PRIMARY component, they wait for other nodes to join. It is required that either all of the members of the previous primary component appear online or otherwise the wait times out. This is expected behavior.

Zdravelina Sokolovska (Inactive) added a comment - 2018-10-05 17:56

the problem is actually that power down, power up occurred for all nodes, node1 was used also for loaded balancer and was wsrep excluded before.

Zdravelina Sokolovska (Inactive) added a comment - 2018-10-05 17:56 the problem is actually that power down, power up occurred for all nodes, node1 was used also for loaded balancer and was wsrep excluded before.

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Zdravelina Sokolovska (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-03-27 15:46

Updated:: 2019-12-12 11:35

Resolved:: 2019-12-12 11:35

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration