[MDEV-10414] After updating to MariaDB-server-10.1.16-1.el7.centos.x86_64 cannot start galera cluster Created: 2016-07-21  Updated: 2016-08-05  Resolved: 2016-07-22

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1.16
Fix Version/s: 10.1.17

Type: Bug Priority: Major
Reporter: sam stein Assignee: Nirbhay Choubey (Inactive)
Resolution: Duplicate Votes: 0
Labels: galera
Environment:

OpenVZ VPS. Fully updated Centos v7 with latest yum updates as of today


Issue Links:
Duplicate
is duplicated by MDEV-10396 MariaDB does not restart after upgrad... Closed

 Description   

Been running MariaDB-server-10.1.14-1.el7.centos.x86_64.rpm for awhile with no problems

As soon as I updated to MariaDB-server-10.1.16-1.el7.centos.x86_64.rpm the galera cluster no longer works. I can start Mariadb by setting wsrep_on=OFF. It will not start with wsrep_on=ON.

I tried restarting the cluster and setting the node to primary.

wsrep_cluster_address=gcomm://

But that did not work. I checked the logs but there is nothing in there. I tried disabling all the setting in the config with the exception of cluster config but that didn't work.

Downgrading back to MariaDB-server-10.1.14-1.el7.centos.x86_64.rpm was the only way to get the cluster working again. I did not change anything else. Tried upgrading again and same problem.

There is something wrong in MariaDB-server-10.1.16-1.el7.centos.x86_64.rpm that causes galera to fail .

Also, you guys removed MariaDB-server-10.1.14-1.el7.centos.x86_64.rpm from the repository. Luckily I was able to find a mirror that had it archived.



 Comments   
Comment by Elena Stepanova [ 2016-07-21 ]

For the repo question, these are not MariaDB's, ours are named differently: http://yum.mariadb.org/10.1.16/centos7-amd64/rpms/
We don't have control over the 3rd party repo, whichever it was.
Several previous versions of MariaDB packages can be found at http://yum.mariadb.org/ .

The actual Galera question goes to nirbhay_c, to see whether it's something wrong with MGC. It is also a possibility that the packages themselves are broken.

Comment by Igor Gueths [ 2016-07-21 ]

I can confirm that this bug has existed since the 10.1.15 package release. As to what causes it, I traced it to some sort of file parsing issue in the wsrep_recover_position function in /usr/bin/galera_recovery. Specifically somewhere within this block:

  recovered_pos="$(grep 'WSREP: Recovered position:' $log_file)"
 
  if [ -z "$recovered_pos" ]; then
    skipped="$(grep WSREP $log_file | grep 'skipping position recovery')"
    if [ -z "$skipped" ]; then
      log "WSREP: Failed to recover position: '`cat $log_file`'"
      exit 1
    else
      log "WSREP: Position recovery skipped."
    fi
  else
    start_pos="$(echo $recovered_pos | sed 's/.*WSREP\:\ Recovered\ position://' \
                    | sed 's/^[ \t]*//')"
    log "WSREP: Recovered position $start_pos"
    start_pos_opt="--wsrep_start_position=$start_pos"
  fi
}

Unfortunately I have not had further time to work on a fix for this; however, hoping Assignee et al can pick this up soon, as I am stuck on 10.1.14 for my clusters now as a result. Thanks.

Comment by Richard Lane [ 2016-07-22 ]

I ran into the same issue and traced it down to the same wsrep_recover_position() code in galera_recover script.
# Redirect server's error log to the log file.
eval /usr/sbin/mysqld $cmdline_args --user=$user --wsrep_recover 2> "$log_file"

What I think is happening is that this script assumes that when it starts mysqld to perform the position recovery, the output it is looking for "WSREP: Recovered position: " will be directed to stdout, but in my case, the output needed is directed to where I asked mysql error logging to go:
In my /etc/my.cnf.d/server.cnf, I have [mysqld] log-error=/var/log/mariadb/mysqld.log

# tail -1 /var/log/mariadb/mysqld.log
2016-07-22 15:58:04 140172581619840 [Note] WSREP: Recovered position: 00000000-0000-0000-0000-000000000000:-1

Not where galera_recover was expecting it.

I did get galera_new_cluster to work by changing the line at the beginning of the wsrep_recover_position() to:

# Redirect server's error log to the log file.
eval /usr/sbin/mysqld $cmdline_args --user=$user --wsrep_recover --log-error=$log_file 2> "$log_file"

Comment by sam stein [ 2016-07-22 ]

To clarify, I am using the official MariaDB10 repository. I copied the name wrong.

Comment by Nirbhay Choubey (Inactive) [ 2016-07-22 ]

rvlane You got it right.

Comment by Nirbhay Choubey (Inactive) [ 2016-07-22 ]

MDEV-10396

Generated at Thu Feb 08 07:42:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.