Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10420

MariaDB fails to start: WSREP Failed to recover position

    XMLWordPrintable

Details

    Description

      I believe I've found a bug in the systemd/systemctl scripts for MariaDB 10.1.16 on Ubuntu 16.04. None of the service or systemctl commands picked up the wsrep_cluster_address from my config files properly nor could they bootstrap a new cluster no matter what I tried.

      I opened a question on DBA Stack Exchange about it: http://dba.stackexchange.com/q/144691/23088

      I'll cut right to the conclusion: Downgrading to 10.1.14 fixed the problem for me.

      Here are the details:

      I'm using 10.1.16-MariaDB-1~xenial from the official MariaDB apt repository for 10.1 [stable], via the University of Texas mirror: https://downloads.mariadb.org/mariadb/repositories/#mirror=ut-austin&distro=Ubuntu&distro_release=xenial--ubuntu_xenial&version=10.1

      I had a perfectly functioning MariaDB Galera cluster setup on 3 Ubuntu 16.04 servers.

      Then I upgraded them with apt full-upgrade. Now I have nothing.

      The upgrade to 10.1.16 failed, and quickly brought down the whole cluster. I don't have the output, but dpkg failed on setting up mariadb-server and mariadb-server-10.1.

      I have backups, so I purged all traces of MariaDB/MySQL/Galera from my servers (including removing /var/lib/mysql/, /etc/mysql/, and /var/log/mysql/) and started over. However, now, with a clean install on each server, none of the standard system startup scripts work. I suspect this is why the upgrade process through apt failed, too.

      I've tried each of the following on my first node:

      galera_new_cluster
      service mysql bootstrap
      service mysql bootstrap --wsrep-new-cluster
      service mysql bootstrap --wsrep-cluster-address="gcomm://"
      service mysql start
      service mysql start --wsrep-new-cluster
      service mysql start --wsrep-cluster-address="gcomm://"
      systemctl start mariadb
      systemctl start mariadb --wsrep-new-cluster
      systemctl start mariadb --wsrep-cluster-address="gcomm://"
      

      Every single one gives me the same output:

      Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.
      

      systemctl status mariadb.service:

      ● mariadb.service - MariaDB database server
         Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
        Drop-In: /etc/systemd/system/mariadb.service.d
                 └─migrated-from-my.cnf-settings.conf
         Active: failed (Result: exit-code) since Fri 2016-07-22 13:29:45 CDT; 42s ago
        Process: 10799 ExecStartPre=/bin/sh -c VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] &&   systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=1/FAILURE)
        Process: 10794 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
       Main PID: 16865 (code=exited, status=0/SUCCESS)
          
      Jul 22 13:29:41 sql2 systemd[1]: Starting MariaDB database server...
      Jul 22 13:29:45 sql2 mysqld[10799]: WSREP: Failed to recover position: '2016-07-22 13:29:41 140110745778432 [Note] /usr/sbin/mysqld (mysqld 10.1.16-MariaDB-1~xenial) starting as process 11080 ...'
      Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Control process exited, code=exited status=1
      Jul 22 13:29:45 sql2 systemd[1]: Failed to start MariaDB database server.
      Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Unit entered failed state.
      Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Failed with result 'exit-code'.
      

      The only way I can start my servers now is by manually executing:

      sudo -u mysql mysqld --wsrep-cluster-address='gcomm://'
      

      On the first node, and then:

      sudo -u mysql mysqld --wsrep-cluster-address='gcomm://ip1,ip2,ip3'
      

      On the other two nodes. That works, and I have a working cluster again. But now, systemd/systemctl have no idea the service is running. It seems like the systemd startup scripts can't use the wsrep-cluster-address setting in my configuration files at all. Specifying it to service or systemctl command line does not work either.

      I've been able to temporarily alleviate my problems by downgrading to 10.1.14:

      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/m/mariadb-10.1/mariadb-server-10.1_10.1.14+maria-1~xenial_amd64.deb
      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/m/mariadb-10.1/libmariadbclient18_10.1.14+maria-1~xenial_amd64.deb
      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/m/mariadb-10.1/libmysqlclient18_10.1.14+maria-1~xenial_amd64.deb
      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/m/mariadb-10.1/mariadb-client-10.1_10.1.14+maria-1~xenial_amd64.deb
      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/m/mariadb-10.1/mariadb-client-core-10.1_10.1.14+maria-1~xenial_amd64.deb
      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/m/mariadb-10.1/mariadb-common_10.1.14+maria-1~xenial_all.deb
      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/m/mariadb-10.1/mariadb-server-core-10.1_10.1.14+maria-1~xenial_amd64.deb
      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/m/mariadb-10.1/mysql-common_10.1.14+maria-1~xenial_all.deb
      wget https://downloads.mariadb.com/files/MariaDB/mariadb-10.1.14/repo/ubuntu/pool/main/g/galera-3/galera-3_25.3.15-xenial_amd64.deb
      apt purge mariadb-server
      apt autoremove --purge
      apt clean
      rm -rf /etc/mysql/
      rm -rf /var/lib/mysql/
      rm -rf /var/log/mysql/
      rm -f /var/log/mysql.*
      rm -f /etc/systemd/system/mysql.service
      rm -f /etc/systemd/system/mysqld.service
      rm -f /etc/rc0.d/K03mysql
      rm -f /etc/rc1.d/K03mysql
      rm -f /etc/rc2.d/S03mysql
      rm -f /etc/rc3.d/S03mysql
      rm -f /etc/rc4.d/S03mysql
      rm -f /etc/rc5.d/S03mysql
      rm -f /etc/rc6.d/K03mysql
      rm -f /var/lib/systemd/deb-systemd-helper-enabled/mysql.service
      rm -f /var/lib/systemd/deb-systemd-helper-enabled/mysqld.service
      rm -f /etc/apparmor.d/abstractions/mysql
      rm -f /etc/apparmor.d/cache/usr.sbin.mysqld
      rm -rf /etc/systemd/system/mariadb.service.d
      rm -f /etc/systemd/system/multi-user.target.wants/mariadb.service
      rm -f /var/lib/systemd/deb-systemd-helper-enabled/mariadb.service.dsh-also
      rm -f /var/lib/systemd/deb-systemd-helper-enabled/multi-user.target.wants/mariadb.service
      rm -f /var/lib/systemd/deb-systemd-helper-enabled/mysql.service
      rm -f /var/lib/systemd/deb-systemd-helper-enabled/mysqld.service
      rm -rf /var/lib/apt/lists/*
      apt update
      apt install iproute libaio1 libcgi-fast-perl libcgi-pm-perl libdbd-mysql-perl libdbi-perl libencode-locale-perl libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl libjemalloc1 liblwp-mediatypes-perl libtimedate-perl liburi-perl socat
      dpkg -i mysql-common_10.1.14+maria-1~xenial_all.deb 
      dpkg -i mariadb-common_10.1.14+maria-1~xenial_all.deb 
      dpkg -i libmariadbclient18_10.1.14+maria-1~xenial_amd64.deb libmysqlclient18_10.1.14+maria-1~xenial_amd64.deb mariadb-client-10.1_10.1.14+maria-1~xenial_amd64.deb mariadb-client-core-10.1_10.1.14+maria-1~xenial_amd64.deb mariadb-server-10.1_10.1.14+maria-1~xenial_amd64.deb mariadb-server-core-10.1_10.1.14+maria-1~xenial_amd64.deb galera-3_25.3.15-xenial_amd64.deb
      

      Now, I can start my first node with galera_new_cluster and all other nodes with service mysql start.

      There must be a bug with the Ubuntu systemd/systemctl scripts in 10.1.16.

      Attachments

        Issue Links

          Activity

            People

              nirbhay_c Nirbhay Choubey (Inactive)
              awensley Andrew Ensley
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.