[MXS-2651] Failover/switchover on empty MariaDB replication cluster fails Created: 2019-08-30  Updated: 2020-08-27  Resolved: 2020-08-27

Status: Closed
Project: MariaDB MaxScale
Component/s: failover
Affects Version/s: 2.3.9, 2.4.1
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Michael Hinkel Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None


 Description   

I've observed, that MaxScale reports the following error when being used on a freshly installed and configured MariaDB replication cluster (MariaDB version 10.4.7 (bionic)):

The backend cluser does not support failover/switchover due to the following reason(s): slave connection from B to A:3306 is not using gtid-replication

.

How to reproduce:

  1. Start at least 2 MariaDB instances
  2. Configure the slave to use the master as its master node (according to the docs)
  3. Join the slave to the master
  4. Configure the MaxScale monitor to observe those MariaDB instances
  5. Start MaxScale

Results

  1. MaxScale reports the above mentioned error upon start-up
  2. MaxScale detects the configured and started MariaDB instances and reports the correct status ([Master, Running] for the master node and [Slave, Running] for all slave nodes
  3. Replication from master to slave works fine
  4. When the master node is stopped no failover occurs
  5. Even after some operations have been done on this cluster, MaxScale does not revert its status regarding the slave nodes.
  6. The slave nodes' capability will be correctly identified after a restart of MaxScale (after the replication feature has been used at least once)

How to work around this issue?

  1. Create the MariaDB cluster
  2. Create a temporary table (or a database or whatever) on the master node (which will be replicated to the slave nodes)
  3. Delete that temporary object (or just leave it as is...)
  4. Start MaxScale

Hopefully there is a way for MaxScale to correclty detect whether or not GTID replication was configured. Maybe the respective variables of the slave nodes could be read and evaluated (e.g. Using_Gtid. The variable Gtid_IO_Pos won't give the correct result, because it only contains a value after at least one transaction has been done.



 Comments   
Comment by markus makela [ 2019-10-07 ]

You can run the maxctrl call command mariadbmon reset-replication <monitor-name> <master-name> to initialize the cluster properly. Also make sure you configure replication with MASTER_USE_GTID.

Comment by markus makela [ 2020-08-27 ]

As GTID replication is a requirement, this is expected behavior.

Generated at Thu Feb 08 04:15:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.