Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20996

Maxscale auto-failover with semi-sync replication is not providing a true HA solution



    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • N/A
    • Replication
    • None


      We have be using maxscale-2.3 with mariadbmon monitor and auto-failover for our HA solution with 3 database nodes - Master/Slave/Slave. With traffic, realistically, you MUST use semi-sync replication to make this viable, otherwise near 100% of the time a Master failed server will not come back as slave w/o 1236 error due to transactions committed to storage engine that have not yet been replicated to any slave.

      Therefore, we use semi-sync replication with wait_point AFTER_SYNC. Now given this, see https://mariadb.com/kb/en/library/semisynchronous-replication/#configuring-the-master-wait-point. There are known issues with semi-sync replication after master failure/crash which will result in the same issue, Master not coming back as Slave due to a prepared transaction that is committed by automatic crash recovery. We had tried working around this by performing an automatic "Manual heuristic recovery rollback" but that did not prevent the transaction from going through after the failed master came back and we still got the 1236 replication error.

      I am aware of MENT-203 (resulting from MDEV-19733), but this is in the queue as a feature request, which may have been fine before maxscale starting supporting auto-failover as an HA solution. However, supporting an HA solution with maxscale, this is now a bug and prevents maxscale with auto-failover from truely being a robust HA solution.

      Maybe a short term solution would be to allow the user to disable auto-crash recovery? Not sure if this would be a viable long term solution but we are also looking for a way to make this more reliable before a true solution to this is provided.


        Issue Links



              Unassigned Unassigned
              rvlane Richard Lane
              2 Vote for this issue
              13 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.