[MDEV-18236] Galera node should raise an error when another node has the same UUID Created: 2019-01-14  Updated: 2020-08-25  Resolved: 2019-08-15

Status: Closed
Project: MariaDB Server
Component/s: Galera, wsrep
Affects Version/s: 10.2.14
Fix Version/s: 10.2.27, 10.1.42, 10.3.18

Type: Bug Priority: Critical
Reporter: Geoff Montee (Inactive) Assignee: Teemu Ollakka
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-20337 Merge galera release_25.3.27 Closed

 Description   

If two nodes have the same my_uuid value in their gvwstate.dat and if they try to join the same cluster, then you would think that they would notice the conflict, and that one of them would throw an error. Instead, they just get stuck in an infinite loop of timeouts.

For example, let's say that two nodes have the following in gvwstate.dat:

my_uuid: 5025de8a-15db-11e9-b571-abb3e219b4d0

And then the first node starts up:

2019-01-14 11:33:38 140385718540256 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2019-01-14 11:33:38 140385718540256 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2019-01-14 11:33:38 140385718540256 [Note] WSREP: EVS version 0
2019-01-14 11:33:38 140385718540256 [Note] WSREP: gcomm: bootstrapping new group 'my_wsrep_cluster'
2019-01-14 11:33:38 140385718540256 [Note] WSREP: start_prim is enabled, turn off pc_recovery
2019-01-14 11:33:38 140385718540256 [Note] WSREP: Node 5025de8a state prim
2019-01-14 11:33:38 140385718540256 [Note] WSREP: view(view_id(PRIM,5025de8a,6) memb {
	5025de8a,0
} joined {
} left {
} partitioned {
})
...
2019-01-14 11:33:45 140385718540256 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.2.14-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server

And then the second node starts up:

2019-01-14 11:36:14 140151256176608 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2019-01-14 11:36:14 140151256176608 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2019-01-14 11:36:14 140151256176608 [Note] WSREP: EVS version 0
2019-01-14 11:36:14 140151256176608 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '10.2.220.17:,10.2.220.18:,10.2.220.19:'
2019-01-14 11:36:14 140151256176608 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection established to 5025de8a tcp://10.2.220.19:4567
2019-01-14 11:36:14 140151256176608 [Warning] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') address 'tcp://10.2.220.19:4567' points to own listening address, blacklisting
2019-01-14 11:36:17 140151256176608 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.19:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:17 140151256176608 [Warning] WSREP: no nodes coming from prim view, prim not possible
2019-01-14 11:36:17 140151256176608 [Note] WSREP: view(view_id(NON_PRIM,5025de8a,6) memb {
	5025de8a,0
} joined {
} left {
} partitioned {
})

We can see from the above output from each node that that both nodes have the identifier 5025de8a.

Instead of raising an error message, the nodes just seem to get stuck in an endless loop of timeouts:

2019-01-14 11:36:17 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:17 140124222371584 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50334S), skipping check
2019-01-14 11:36:22 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:27 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:31 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:36 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:41 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:46 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:51 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:36:55 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:00 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:05 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:09 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:14 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:19 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:23 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:27 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:32 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:37 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:41 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:46 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:50 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:37:55 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S
2019-01-14 11:38:00 140124222371584 [Note] WSREP: (5025de8a, 'tcp://0.0.0.0:4567') connection to peer 5025de8a with addr tcp://10.2.220.17:4567 timed out, no messages seen in PT3S

This behavior was seen with Galera 25.3.23.



 Comments   
Comment by Jan Lindström (Inactive) [ 2019-01-29 ]

How those two nodes have same my_uuid value in their gvwstate.dat ? Has that file copied ?

Comment by Geoff Montee (Inactive) [ 2019-01-29 ]

The two nodes had the same gvwstate.dat because they were rebuilt from the same backup.

Comment by Teemu Ollakka [ 2019-01-29 ]

First of all, this issue is best resolved by not including gvwstate.dat into backup.

This is Galera side issue. Currently there is not enough logic to detect the duplicate ID properly and shut down one of the nodes. I think that it would be enough to make node fail the connection attempt to the cluster if it has not yet reached a primary component and it detects a connection attempt from outside with the same ID.

I'll come up with a reproducible test case first, the fix if possible should go to the next Galera maintenance release.

Comment by Geoff Montee (Inactive) [ 2019-01-29 ]

Yeah, I agree that gvwstate.dat should not be included in backups. However, the user who ran into this problem was a bit confused about the weird state of their cluster, and it seemed to me that Galera could probably throw an error or something in cases like this.

Thanks!

Comment by Teemu Ollakka [ 2019-01-30 ]

The following MTR test can be used to reproduce the issue. The test will hang in

--exec $MYSQLD...

if Galera does not throw an error.

--source include/galera_cluster.inc
 
# Copy file from running cluster of two nodes, otherwise the node_2 may
# be able to rebootstrap the singleton cluster before the first connection
# is established.
--copy_file $MYSQLTEST_VARDIR/mysqld.1/data/gvwstate.dat $MYSQLTEST_VARDIR/tmp/MDEV-18236-gvwstate.dat
 
--connection node_2
--source include/shutdown_mysqld.inc
 
--copy_file $MYSQLTEST_VARDIR/tmp/MDEV-18236-gvwstate.dat $MYSQLTEST_VARDIR/mysqld.2/data/gvwstate.dat
 
# Write error log to separate file. The process my abort in Galera/gcs
# code, which will cause ASAN memleak warnings.
--error 1
--exec $MYSQLD --defaults-group-suffix=.2 --defaults-file=$MYSQLTEST_VARDIR/my.cnf --innodb --log-error=$MYSQLTEST_VARDIR/tmp/MDEV-18236.err
 
--remove_file $MYSQLTEST_VARDIR/mysqld.2/data/gvwstate.dat
 
--source include/start_mysqld.inc

Comment by Teemu Ollakka [ 2019-04-18 ]

This issue should have been fixed by https://github.com/MariaDB/galera/commit/5ea082e5949f20138c855238d45724c0008b398d.

Comment by Julien Fritsch [ 2019-07-03 ]

teemu.ollakka is it fixed or not ? The status is OPEN.

Comment by Shahriyar Rzayev (Inactive) [ 2019-07-04 ]

Tested with 10.1 branch from https://github.com/MariaDB/server and 3.x branch from https://github.com/mariadb/galera as discussed with Teemu.

Steps:

  • Start 3 nodes 10.1 + Galera 3.x
  • Stop node2
  • Copy node1/gvwstate.dat to node2/.
  • Start node2

The result -> The joiner waiting in timeouts:
From joiner:

$ /home/shako/Galera_Tests/dbs/maria_10.1/bin/mysqld_safe --defaults-file=/home/shako/Galera_Tests/datadirs/mynode2_10.1.cnf
190704 14:22:10 mysqld_safe Logging to '/tmp/mysql-node2.err'.
190704 14:22:10 mysqld_safe Starting mysqld daemon with databases from /home/shako/Galera_Tests/datadirs/node2
 
2019-07-04 14:28:37 140059553109888 [Note] WSREP: Waiting for SST to complete. current seqno: 1 waited 380.000000 secs.
2019-07-04 14:28:38 140059294562048 [Note] WSREP: (cf6f9f5a, 'tcp://0.0.0.0:5040') connection to peer f020adf0 with addr tcp://127.0.0.1:5060 timed out, no messages seen in PT3S
2019-07-04 14:28:38 140059294562048 [Note] WSREP: (cf6f9f5a, 'tcp://0.0.0.0:5040') connection to peer cf6f9f5a with addr tcp://127.0.0.1:8567 timed out, no messages seen in PT3S
2019-07-04 14:28:43 140059294562048 [Note] WSREP: (cf6f9f5a, 'tcp://0.0.0.0:5040') connection to peer f020adf0 with addr tcp://127.0.0.1:5060 timed out, no messages seen in PT3S
2019-07-04 14:28:43 140059294562048 [Note] WSREP: (cf6f9f5a, 'tcp://0.0.0.0:5040') connection to peer cf6f9f5a with addr tcp://127.0.0.1:8567 timed out, no messages seen in PT3S

Comment by Jan Lindström (Inactive) [ 2019-08-15 ]

Issue will be fixed on next release containing a new galera library version.

Generated at Thu Feb 08 08:42:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.