Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29301

Keep multi-galera replication always up when any of slave/master goes down

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.6.8
    • None
    • Replication
    • None

    Description

      We want to achieve the target design as depicted in the joined picture,
      The purppose is to keep always replication up regardless if any of master/slave goes down.
      we currently hit some issues, that may be due to bad settings/design/bug.

      In this design, slaves must not apply change from the master/slave replication
      if this specific change have already been applied from another node through galera (avoiding duplicate)

      Here is the multi sources repli setup:

      1. On slave0
        CHANGE MASTER "slave0-master0" TO MASTER_HOST="master0,MASTER_PORT=3306,MASTER_USER="slave0",MASTER_PASSWORD="XX",MASTER_USE_GTID=current_pos
        CHANGE MASTER "slave0-master1" TO MASTER_HOST="master1,MASTER_PORT=3306,MASTER_USER="slave0",MASTER_PASSWORD="XX",MASTER_USE_GTID=current_pos
        CHANGE MASTER "slave0-master2" TO MASTER_HOST="master2,MASTER_PORT=3306,MASTER_USER="slave0",MASTER_PASSWORD="XX",MASTER_USE_GTID=current_pos
      2. On slave1
        CHANGE MASTER "slave1-master0" TO MASTER_HOST="master0,MASTER_PORT=3306,MASTER_USER="slave1",MASTER_PASSWORD="XX",MASTER_USE_GTID=current_pos
        CHANGE MASTER "slave1-master1" TO MASTER_HOST="master1,MASTER_PORT=3306,MASTER_USER="slave1",MASTER_PASSWORD="XX",MASTER_USE_GTID=current_pos
        CHANGE MASTER "slave1-master2" TO MASTER_HOST="master2,MASTER_PORT=3306,MASTER_USER="slave1",MASTER_PASSWORD="XX",MASTER_USE_GTID=current_pos

      We tried 2 approaches:

      A) same server id on the 2 slaves:

      • after settings the correct slave_pos, we run a START ALL SLAVES on slave0. At this stage all is fine 3 replication from slave0 works(if one or two master fail, replication is still up).
      • When doing START ALL slaves on slave1, both slaves connection start to compete each other to get connection to master and got disconnected alternatively, because master accept only one slave connection from one server_id:
        [ERROR] Master 'slave1-master1': Error reading packet from server: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event 'master1-bin.000001' at 343, the last event read from 'master1.000001' at 16154585, the last byte read from 'master1-bin.000001' at 16154654. (server_errno=4052)

      It endup with one of the 2 connection working(per master), but second got stuck and never recover until manually update those slave position.
      Here is the some variable from slave0 and slave1:

      @@global.gtid_binlog_pos @@global.gtid_current_pos @@global.gtid_slave_pos

      slave0 | 200-100-46610 | 200-100-43456 | 200-100-43456 |
      slave1 | 200-100-46610 | 200-100-46610 | 200-100-46610 |

      We can see that slave0 and slave1 binlog are in sync thanks to galera replication,
      But it is slave1 that replicate and slave0 gtid_current_pos is no more updated.

      B) different server id on the 2 slaves:
      -When settings diffrent server_id , each master accept both slave connection BUT we have this following error:
      2022-08-11 13:04:11 1693 [ERROR] Master 'slave-master-1': Slave SQL: Node has dropped from cluster, Gtid 200-100-48224, Internal MariaDB error code: 1047
      2022-08-11 13:04:11 1693 [Note] Master 'slave-1-master-1': Slave SQL thread exiting, replication stopped in log 'master1-bin.000001' at position 59236353; GTID position '200-100-48223', master: master1:3306

      And issue is the same as in A) gtid_current_pos get stopped on slave stuck and we have to be STOP/SET_SLAVE_POS/START

      Attachments

        Activity

          People

            Unassigned Unassigned
            aarents Alexandre Arents
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.