Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18236

Galera node should raise an error when another node has the same UUID

    XMLWordPrintable

Details

    Description

      If two nodes have the same my_uuid value in their gvwstate.dat and if they try to join the same cluster, then you would think that they would notice the conflict, and that one of them would throw an error. Instead, they just get stuck in an infinite loop of timeouts.

      For example, let's say that two nodes have the following in gvwstate.dat:

      my_uuid: 5025de8a-15db-11e9-b571-abb3e219b4d0
      

      And then the first node starts up:

      2019-01-14 11:33:38 140385718540256 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2019-01-14 11:33:38 140385718540256 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2019-01-14 11:33:38 140385718540256 [Note] WSREP: EVS version 0
      2019-01-14 11:33:38 140385718540256 [Note] WSREP: gcomm: bootstrapping new group 'my_wsrep_cluster'
      2019-01-14 11:33:38 140385718540256 [Note] WSREP: start_prim is enabled, turn off pc_recovery
      2019-01-14 11:33:38 140385718540256 [Note] WSREP: Node 5025de8a state prim
      2019-01-14 11:33:38 140385718540256 [Note] WSREP: view(view_id(PRIM,5025de8a,6) memb {
      	5025de8a,0
      } joined {
      } left {
      } partitioned {
      })
      ...
      2019-01-14 11:33:45 140385718540256 [Note] /usr/sbin/mysqld: ready for connections.
      Version: '10.2.14-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
      

      And then the second node starts up:

      2019-01-14 11:36:14 140151256176608 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2019-01-14 11:36:14 140151256176608 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2019-01-14 11:36:14 140151256176608 [Note] WSREP: EVS version 0
      2019-01-14 11:36:14 140151256176608 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '10.2.220.17:,10.2.220.18:,10.2.220.19:'
      2019-01-14 11:36:14 140151256176608 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection established to 5025de8a tcp://10.2.220.19:4567
      2019-01-14 11:36:14 140151256176608 [Warning] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') address 'tcp://10.2.220.19:4567' points to own listening address, blacklisting
      2019-01-14 11:36:17 140151256176608 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.19:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:17 140151256176608 [Warning] WSREP: no nodes coming from prim view, prim not possible
      2019-01-14 11:36:17 140151256176608 [Note] WSREP: view(view_id(NON_PRIM,5025de8a,6) memb {
      	5025de8a,0
      } joined {
      } left {
      } partitioned {
      })
      

      We can see from the above output from each node that that both nodes have the identifier 5025de8a.

      Instead of raising an error message, the nodes just seem to get stuck in an endless loop of timeouts:

      2019-01-14 11:36:17 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:17 140124222371584 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50334S), skipping check
      2019-01-14 11:36:22 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:27 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:31 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:36 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:41 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:46 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:51 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:36:55 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:00 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:05 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:09 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:14 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:19 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:23 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:27 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:32 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:37 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:41 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:46 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:50 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:37:55 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      2019-01-14 11:38:00 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
      

      This behavior was seen with Galera 25.3.23.

      Attachments

        Issue Links

          Activity

            People

              teemu.ollakka Teemu Ollakka
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.