[MDEV-5566] Change master to using relay log - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Fix Version/s: N/A
Component/s: Replication, Storage Engine - Spider
Labels:
None

Description

Promoting a slave to master, we need to ensure that the slave get most up to date binlog events from all slaves in the cluster. Using SQL command only we must ensure to make all slaves starting at the same state.

Attachments

Issue Links

blocks

MDEV-5562 LOR Cluster

Closed

MDEV-5570 Spider to promote slave on Master Failure

Closed

Activity

Ascending order - Click to sort in descending order

Sergei Golubchik added a comment - 2015-02-18 19:50

could you elaborate on this, please?

Sergei Golubchik added a comment - 2015-02-18 19:50 could you elaborate on this, please?

VAROQUI Stephane added a comment - 2015-02-18 23:16 - edited

We need to implement a way to make each cluster node topology aware

HA group
DSN
Role : Slave, Master , Standalone
replication type (assync , semi-sync , sync )
Candidate Master

Nodes in a cluster could be instrumented from various way from plugin to external tools so a system table is probably the best here

Defining a cluster manager plugin API that can provide the nodes status

Some possible plugin implementation:

Rest external cluster manager
Corosync plugin in each node
Wsrep plugin in each node
Spider XA monitoring

When system table topology change we need to auto set the replication

This required founding the oldest gtid in the cluster, fetching all following gtid from that node , instrumenting the new master define in new topology and waiting until that master already have the oldest gtid.

I propose to start the task with a SQL command that implement the replication failover based on the system table (improving the existing "server" table with a cluster name or HA group to mimic the fabric concept + additional per node status properties )

In charge to cluster manager plugin or external tools for populating that table before using the command

MHA or MariaDB rpl tools from Guillaume , have to do this manually found the topology of the replication , found the most up to date slave , wait until each slave catch up from the promoted master

Maxscale can populate such tables based on his monitoring plugin and later on trigger failover by invoking the command

First cluster manager plugin can be demonstrated on 3 nodes using spider storage engine

One of the node is instrumented with :
cluster_manager_nodes=node1,node2,node3
cluster_manager_ha_group=mycluster1
cluster_manager_ha_group_mode=master-slaves-assync
cluster_manager_director=on
cluster_manager_candidate_master=off

All other nodes
cluster_manager_nodes=node1,node2,node3
cluster_manager_ha_group=mycluster1
cluster_manager_ha_group_mode=master-slaves-assync
cluster_manager_director=off
cluster_manager_candidate_master=on

All nodes
Create a dummy heartbeat single record system table when loading the plugin

Director do
-heartbeat_spider table linking to all heartbeat table of every cluster node and replicate to all nodes in XA

Spider start to monitor the status of the heartbeat table
If the state of the connection to heartbeat table is changing spider will change is own spider_table and the plugin can change the failover server system table on every remaining nodes and trigger failover on each of those nodes
Constantly changing the status of the spider_table for the old master to check if the old master is coming back to life (this should be improved as a native spider feature to have an extra status value)
When old master back to life mark him as not accepting connections somehow , using connection pool we can set the pool to 0

We later can emprove with a rollback from gtid feature that would rollback following gtid transactions by reversing the binlog row events based on the before image. It would enable reintroducing the old master, and copying all rollbacked events to a bin-log.lost files.

VAROQUI Stephane added a comment - 2015-02-18 23:16 - edited We need to implement a way to make each cluster node topology aware HA group DSN Role : Slave, Master , Standalone replication type (assync , semi-sync , sync ) Candidate Master Nodes in a cluster could be instrumented from various way from plugin to external tools so a system table is probably the best here Defining a cluster manager plugin API that can provide the nodes status Some possible plugin implementation: Rest external cluster manager Corosync plugin in each node Wsrep plugin in each node Spider XA monitoring When system table topology change we need to auto set the replication This required founding the oldest gtid in the cluster, fetching all following gtid from that node , instrumenting the new master define in new topology and waiting until that master already have the oldest gtid. I propose to start the task with a SQL command that implement the replication failover based on the system table (improving the existing "server" table with a cluster name or HA group to mimic the fabric concept + additional per node status properties ) In charge to cluster manager plugin or external tools for populating that table before using the command MHA or MariaDB rpl tools from Guillaume , have to do this manually found the topology of the replication , found the most up to date slave , wait until each slave catch up from the promoted master Maxscale can populate such tables based on his monitoring plugin and later on trigger failover by invoking the command First cluster manager plugin can be demonstrated on 3 nodes using spider storage engine One of the node is instrumented with : cluster_manager_nodes=node1,node2,node3 cluster_manager_ha_group=mycluster1 cluster_manager_ha_group_mode=master-slaves-assync cluster_manager_director=on cluster_manager_candidate_master=off All other nodes cluster_manager_nodes=node1,node2,node3 cluster_manager_ha_group=mycluster1 cluster_manager_ha_group_mode=master-slaves-assync cluster_manager_director=off cluster_manager_candidate_master=on All nodes Create a dummy heartbeat single record system table when loading the plugin Director do -heartbeat_spider table linking to all heartbeat table of every cluster node and replicate to all nodes in XA Spider start to monitor the status of the heartbeat table If the state of the connection to heartbeat table is changing spider will change is own spider_table and the plugin can change the failover server system table on every remaining nodes and trigger failover on each of those nodes Constantly changing the status of the spider_table for the old master to check if the old master is coming back to life (this should be improved as a native spider feature to have an extra status value) When old master back to life mark him as not accepting connections somehow , using connection pool we can set the pool to 0 We later can emprove with a rollback from gtid feature that would rollback following gtid transactions by reversing the binlog row events based on the before image. It would enable reintroducing the old master, and copying all rollbacked events to a bin-log.lost files.

MariaDB Server

Change master to using relay log

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration