[MDEV-14704] MariaDB galera failover not working Created: 2017-12-18  Updated: 2018-02-02  Resolved: 2018-02-02

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Omkar Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: galera, need_feedback


 Description   

I am trying to have multiple nodes working as ambari database. I have configured both database servers in galera cluster master-master configuration. But when . primary is down, it doesn;t try for 2nd host. I tried connection strings with autoReconnect,failover,loadbalance,sequential, replication but doesn;t work.



 Comments   
Comment by Daniel Black [ 2017-12-19 ]

Two nodes isn't suitable for a galera cluster. When a node is recovering the other node is being a donor. Depending on your SST mechanism (defaults to rsync) there will be an outage when this occurs as a donor cannot be fully online when it helps the other server recover.

Use a minimum of 3 nodes for galera.

Can you include the configuration files for both server along with their error logs covering the time of this failure to verify this was/wasn't the cause?

The connections strings autoReconnect... are these mariadb-connector-j ?

Comment by Omkar [ 2018-01-02 ]

I am using mysql-connector-java-5.1.35.jar. Config file :

# this is read by the standalone daemon and embedded servers
[server]
 
log-error=/var/log/mysqld.log
 
 
# this is only for the mysqld standalone daemon
[mysqld]
 
pid-file=/var/run/mysqld/mysqld.pid
 
datadir=/var/lib/mysql
 
socket=/var/lib/mysql/mysql.sock
 
user=mysql
 
#
# * Galera-related settings
#
[galera]
# Mandatory settings
 
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so 
wsrep_cluster_address=gcomm://ip1,ip2
wsrep_cluster_name=c65
wsrep_node_address=ip1
wsrep_node_name=reservoir-nn-1.ci41.lsf04.ibmwhc.net
wsrep_sst_method=rsync
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2

Comment by Omkar [ 2018-01-02 ]

Error Log:

Internal Exception: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
Error Code: 1213
        at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:331)
        at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicCommitTransaction(DatabaseAccessor.java:445)
        at org.eclipse.persistence.internal.databaseaccess.DatasourceAccessor.commitTransaction(DatasourceAccessor.java:405)
        at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.commitTransaction(DatabaseAccessor.java:427)
        at org.eclipse.persistence.internal.sessions.AbstractSession.basicCommitTransaction(AbstractSession.java:761)
        at org.eclipse.persistence.sessions.server.ClientSession.basicCommitTransaction(ClientSession.java:174)
        at org.eclipse.persistence.internal.sessions.AbstractSession.commitTransaction(AbstractSession.java:965)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransaction(UnitOfWorkImpl.java:1600)
        at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitTransaction(RepeatableWriteUnitOfWork.java:647)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransactionAfterWriteChanges(UnitOfWorkImpl.java:1615)
        at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitRootUnitOfWork(RepeatableWriteUnitOfWork.java:284)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitAndResume(UnitOfWorkImpl.java:1169)
        at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:132)
        ... 106 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)

Comment by Daniel Black [ 2018-01-04 ]

Can you show the /var/log/mysqld.log error log from the primary and secondary from the time of the primary being down? Do you know how/why the primary was down? What is 'show global status' on the secondary?

Dead lock errors in the above are something that should be handled by your application. Is this a backtrace from a connection to the second server?

If it doesn't connect to the second server, then that is a connector/J fault right? What exactly do you mean by "Doesn't work" - in terms error message or observed behaviour?

The Conecctor/J replication option seems to be for master/slave replicas which isn't the case with galera. https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-master-slave-replication-connection.html. What connector/J does with a replication options depends on its implementation. It may not be applicable for galera.

Generated at Thu Feb 08 08:15:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.