[MDEV-10420] MariaDB fails to start: WSREP Failed to recover position Created: 2016-07-22 Updated: 2020-03-12 Resolved: 2016-07-22 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Platform Debian, Replication, Scripts & Clients, wsrep |
| Affects Version/s: | 10.1.16 |
| Fix Version/s: | 10.1.17 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrew Ensley | Assignee: | Nirbhay Choubey (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Ubuntu 16.04 amd64 |
||
| Issue Links: |
|
||||||||||||||||
| Description |
|
I believe I've found a bug in the systemd/systemctl scripts for MariaDB 10.1.16 on Ubuntu 16.04. None of the service or systemctl commands picked up the wsrep_cluster_address from my config files properly nor could they bootstrap a new cluster no matter what I tried. I opened a question on DBA Stack Exchange about it: http://dba.stackexchange.com/q/144691/23088 I'll cut right to the conclusion: Downgrading to 10.1.14 fixed the problem for me. Here are the details: I'm using 10.1.16-MariaDB-1~xenial from the official MariaDB apt repository for 10.1 [stable], via the University of Texas mirror: https://downloads.mariadb.org/mariadb/repositories/#mirror=ut-austin&distro=Ubuntu&distro_release=xenial--ubuntu_xenial&version=10.1 I had a perfectly functioning MariaDB Galera cluster setup on 3 Ubuntu 16.04 servers. Then I upgraded them with apt full-upgrade. Now I have nothing. The upgrade to 10.1.16 failed, and quickly brought down the whole cluster. I don't have the output, but dpkg failed on setting up mariadb-server and mariadb-server-10.1. I have backups, so I purged all traces of MariaDB/MySQL/Galera from my servers (including removing /var/lib/mysql/, /etc/mysql/, and /var/log/mysql/) and started over. However, now, with a clean install on each server, none of the standard system startup scripts work. I suspect this is why the upgrade process through apt failed, too. I've tried each of the following on my first node:
Every single one gives me the same output:
systemctl status mariadb.service:
The only way I can start my servers now is by manually executing:
On the first node, and then:
On the other two nodes. That works, and I have a working cluster again. But now, systemd/systemctl have no idea the service is running. It seems like the systemd startup scripts can't use the wsrep-cluster-address setting in my configuration files at all. Specifying it to service or systemctl command line does not work either. I've been able to temporarily alleviate my problems by downgrading to 10.1.14:
Now, I can start my first node with galera_new_cluster and all other nodes with service mysql start. There must be a bug with the Ubuntu systemd/systemctl scripts in 10.1.16. |
| Comments |
| Comment by Nirbhay Choubey (Inactive) [ 2016-07-22 ] |
| Comment by Andrew Ensley [ 2016-07-22 ] |
|
I don't believe this is a duplicate of I only get the error message (WSREP Failed to recover position) when wsrep_on is set to ON. I can leave everything else the same, disable that option, and start mariadb without issue via all the methods I mentioned above. Of course, I have no cluster that way either. But I don't think this is related to the log setting, since mariadb was starting just fine with my custom log settings. |
| Comment by Nirbhay Choubey (Inactive) [ 2016-07-25 ] |
|
| Comment by Andrew Ensley [ 2016-07-25 ] |
|
Duh. I missed that because it wasn't stated explicitly, but why else would galera_recovery be running? Sorry. |
| Comment by Oleksandr Diamantopulo [ 2017-02-05 ] |
|
Andrew Ensley, were you able to start your cluster? Nirbhay Choubey, |