Details

    Description

      Promoting a slave to master, we need to ensure that the slave get most up to date binlog events from all slaves in the cluster. Using SQL command only we must ensure to make all slaves starting at the same state.

      Attachments

        Issue Links

          Activity

            could you elaborate on this, please?

            serg Sergei Golubchik added a comment - could you elaborate on this, please?
            stephane@skysql.com VAROQUI Stephane added a comment - - edited

            We need to implement a way to make each cluster node topology aware

            • HA group
            • DSN
            • Role : Slave, Master , Standalone
            • replication type (assync , semi-sync , sync )
            • Candidate Master

            Nodes in a cluster could be instrumented from various way from plugin to external tools so a system table is probably the best here

            Defining a cluster manager plugin API that can provide the nodes status

            Some possible plugin implementation:

            • Rest external cluster manager
            • Corosync plugin in each node
            • Wsrep plugin in each node
            • Spider XA monitoring

            When system table topology change we need to auto set the replication

            This required founding the oldest gtid in the cluster, fetching all following gtid from that node , instrumenting the new master define in new topology and waiting until that master already have the oldest gtid.

            I propose to start the task with a SQL command that implement the replication failover based on the system table (improving the existing "server" table with a cluster name or HA group to mimic the fabric concept + additional per node status properties )

            In charge to cluster manager plugin or external tools for populating that table before using the command

            MHA or MariaDB rpl tools from Guillaume , have to do this manually found the topology of the replication , found the most up to date slave , wait until each slave catch up from the promoted master

            Maxscale can populate such tables based on his monitoring plugin and later on trigger failover by invoking the command

            First cluster manager plugin can be demonstrated on 3 nodes using spider storage engine

            One of the node is instrumented with :
            cluster_manager_nodes=node1,node2,node3
            cluster_manager_ha_group=mycluster1
            cluster_manager_ha_group_mode=master-slaves-assync
            cluster_manager_director=on
            cluster_manager_candidate_master=off

            All other nodes
            cluster_manager_nodes=node1,node2,node3
            cluster_manager_ha_group=mycluster1
            cluster_manager_ha_group_mode=master-slaves-assync
            cluster_manager_director=off
            cluster_manager_candidate_master=on

            All nodes
            Create a dummy heartbeat single record system table when loading the plugin

            Director do
            -heartbeat_spider table linking to all heartbeat table of every cluster node and replicate to all nodes in XA

            • Spider start to monitor the status of the heartbeat table
            • If the state of the connection to heartbeat table is changing spider will change is own spider_table and the plugin can change the failover server system table on every remaining nodes and trigger failover on each of those nodes
            • Constantly changing the status of the spider_table for the old master to check if the old master is coming back to life (this should be improved as a native spider feature to have an extra status value)
            • When old master back to life mark him as not accepting connections somehow , using connection pool we can set the pool to 0

            We later can emprove with a rollback from gtid feature that would rollback following gtid transactions by reversing the binlog row events based on the before image. It would enable reintroducing the old master, and copying all rollbacked events to a bin-log.lost files.

            stephane@skysql.com VAROQUI Stephane added a comment - - edited We need to implement a way to make each cluster node topology aware HA group DSN Role : Slave, Master , Standalone replication type (assync , semi-sync , sync ) Candidate Master Nodes in a cluster could be instrumented from various way from plugin to external tools so a system table is probably the best here Defining a cluster manager plugin API that can provide the nodes status Some possible plugin implementation: Rest external cluster manager Corosync plugin in each node Wsrep plugin in each node Spider XA monitoring When system table topology change we need to auto set the replication This required founding the oldest gtid in the cluster, fetching all following gtid from that node , instrumenting the new master define in new topology and waiting until that master already have the oldest gtid. I propose to start the task with a SQL command that implement the replication failover based on the system table (improving the existing "server" table with a cluster name or HA group to mimic the fabric concept + additional per node status properties ) In charge to cluster manager plugin or external tools for populating that table before using the command MHA or MariaDB rpl tools from Guillaume , have to do this manually found the topology of the replication , found the most up to date slave , wait until each slave catch up from the promoted master Maxscale can populate such tables based on his monitoring plugin and later on trigger failover by invoking the command First cluster manager plugin can be demonstrated on 3 nodes using spider storage engine One of the node is instrumented with : cluster_manager_nodes=node1,node2,node3 cluster_manager_ha_group=mycluster1 cluster_manager_ha_group_mode=master-slaves-assync cluster_manager_director=on cluster_manager_candidate_master=off All other nodes cluster_manager_nodes=node1,node2,node3 cluster_manager_ha_group=mycluster1 cluster_manager_ha_group_mode=master-slaves-assync cluster_manager_director=off cluster_manager_candidate_master=on All nodes Create a dummy heartbeat single record system table when loading the plugin Director do -heartbeat_spider table linking to all heartbeat table of every cluster node and replicate to all nodes in XA Spider start to monitor the status of the heartbeat table If the state of the connection to heartbeat table is changing spider will change is own spider_table and the plugin can change the failover server system table on every remaining nodes and trigger failover on each of those nodes Constantly changing the status of the spider_table for the old master to check if the old master is coming back to life (this should be improved as a native spider feature to have an extra status value) When old master back to life mark him as not accepting connections somehow , using connection pool we can set the pool to 0 We later can emprove with a rollback from gtid feature that would rollback following gtid transactions by reversing the binlog row events based on the before image. It would enable reintroducing the old master, and copying all rollbacked events to a bin-log.lost files.

            People

              Unassigned Unassigned
              stephane@skysql.com VAROQUI Stephane
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.