Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25851

Cluster on asymmetric IP addresses fails IST/SST

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.10
    • 10.5
    • Galera
    • None

    Description

      I have a 3-node WAN cluster. The nodes are connected to each-other by VPN without a central server so as to avoid a single point of failure. Instead, each node is an openvpn server with one other node as client. It follows that each VPN server uses its own 10.x subnet and each node has two tap interfaces. The result is this:

      ┌──────────┐               ┌──────────┐
      │10.x.0.1  ├───────────────┤ 10.x.0.2 │
      │          │               │          │
      │ NODE A   │               │  NODE B  │
      │          │               │          │
      │10.z.0.2  │               │ 10.y.0.1 │
      └────────┬─┘               └──┬───────┘
               │                    │
               │                    │
               │    ┌──────────┐    │
               └────┤ 10.z.0.1 │    │
                    │          │    │
                    │  NODE C  │    │
                    │          │    │
                    │ 10.y.0.2 ├────┘
                    └──────────┘
      

      It works fine with other similar stuff (e.g. openldap multimaster and glusterfs), but it confuses galera because wsrep_node_address is a global and takes one single value. If unset, it will use the address of eth0, which would be wrong here. But if it is set to one of the node's two VPN endpoints, it gets passed as a global to all the other nodes and then I have e.g. node B trying to contact node C on 10.z.0.1 and the cluster breaks.

      It should be noted that this is a problem on the sender/donor side, not on the recipient/joiner. With the same example, if node C needs syncing and node B is most advanced, B will try to contact C on 10.z.0.1, which is permanently unreachable for B. Hence, setting wsrep_node_incoming_address is of no use.

      This affects all multihomed cluster nodes and defies the benefits of multihoming. The simple solution would be to make wsrep_node_address(es) multi-value.

      Attachments

        Activity

          People

            seppo Seppo Jaakola
            zenon Zenon Panoussis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.