Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15691

mariadb.service entered failed mode after power down/up event

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.3.5
    • N/A
    • Galera, Server
    • CentOS 7.4.

    Description

      mariadb.service entered failed mode after power down/up event

      galera cluster was powered down /up after power supply problem
      mysql process is running but cannot login to mysql shell and it's found that
      mariadb.service was entered failed mode

      note: cluster was in synced state before the power down

      #  ps aux | grep -v grep | grep mysql
      mysql      965  0.0  2.2 629204 46616 ?        Ssl  02:08   0:40 /usr/sbin/mysqld --wsrep_start_position=dff6e041-1005-11e8-85c9-965f304f37bc:131668
      
      

      # systemctl status  mariadb.service
      ● mariadb.service - MariaDB 10.3.5 database server
         Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/mariadb.service.d
                 └─migrated-from-my.cnf-settings.conf
         Active: failed (Result: timeout) since Tue 2018-03-27 02:11:48 EEST; 15h ago
           Docs: man:mysqld(8)
                 https://mariadb.com/kb/en/library/systemd/
        Process: 867 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
        Process: 860 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
       Main PID: 965
         CGroup: /system.slice/mariadb.service
                 └─965 /usr/sbin/mysqld --wsrep_start_position=dff6e041-1005-11e8-85c9-965f304f37bc:131668
       
      Mar 27 02:08:44 localhost.localdomain systemd[1]: Starting MariaDB 10.3.5 database server...
      Mar 27 02:08:48 localhost.localdomain sh[867]: WSREP: Recovered position dff6e041-1005-11e8-85c9-965f304f37bc:131668
      Mar 27 02:08:48 localhost.localdomain mysqld[965]: 2018-03-27  2:08:48 0 [Note] /usr/sbin/mysqld (mysqld 10.3.5-MariaDB) starting as process 965 ...
      Mar 27 02:10:18 t4w5.xentio.lan systemd[1]: mariadb.service start operation timed out. Terminating.
      Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
      Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: Failed to start MariaDB 10.3.5 database server.
      Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: Unit mariadb.service entered failed state.
      Mar 27 02:11:48 t4w5.xentio.lan systemd[1]: mariadb.service failed.
      
      

      Attachments

        Activity

          mkaruza Mario Karuza (Inactive) added a comment - - edited

          Cluster consists of 4 nodes: 1 - 192.168.104.191, 2 - 192.168.104.193, 3 - 192.168.104.195, 4 - 192.168.104.196:

          Node 3 ( 192.168.104.195 ) after non graceful restart comes alive, preservers gvwstate.dat. At this point it will try to connect to other nodes. It is only successful to connecting to node 4 ( 192.168.104.196 ).

          Since both of these 2 nodes are not responsible to create new PRIMARY component, they wait for other nodes to join.

          It is required that either all of the members of the previous primary component appear online or otherwise the wait times out.

          This is expected behavior.

          mkaruza Mario Karuza (Inactive) added a comment - - edited Cluster consists of 4 nodes: 1 - 192.168.104.191, 2 - 192.168.104.193, 3 - 192.168.104.195, 4 - 192.168.104.196: Node 3 ( 192.168.104.195 ) after non graceful restart comes alive, preservers gvwstate.dat. At this point it will try to connect to other nodes. It is only successful to connecting to node 4 ( 192.168.104.196 ). Since both of these 2 nodes are not responsible to create new PRIMARY component, they wait for other nodes to join. It is required that either all of the members of the previous primary component appear online or otherwise the wait times out. This is expected behavior.

          the problem is actually that power down, power up occurred for all nodes, node1 was used also for loaded balancer and was wsrep excluded before.

          winstone Zdravelina Sokolovska (Inactive) added a comment - the problem is actually that power down, power up occurred for all nodes, node1 was used also for loaded balancer and was wsrep excluded before.

          People

            jplindst Jan Lindström (Inactive)
            winstone Zdravelina Sokolovska (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.