[MDEV-29171] changing the value of wsrep_gtid_domain_id with full cluster restart fails on some nodes - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5, 10.6, 10.7(EOL), 10.8(EOL)
Fix Version/s: 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3
Component/s: Galera
Labels:
None

Description

I know that changing wsrep_gtid_domain_id in the configuration file on all nodes requires a full cluster stop and restart to pick up the change as only the first node will actually use the config file value, while all further nodes receive the value to use from their donor during IST/SST.

But when e.g. changing the relevant configuration settings from

wsrep-gtid-mode=ON

wsrep_gtid_domain_id=100

gtid_domain_id=...different value per node....

wsrep_gtid_domain_id=200

then stopping all nodes, staring node-1 with galera_new_cluster, and node-2 with systemctl start mariadb, all is fine so far, both nodes show wsrep_gtid_domain_id=200 in SHOW VARIABLES LIKE 'wsrep_gtid_domain_id'. But when starting a 3rd or further nodes they all usually still show 100 instead of 200.

Looking at the error logs I see node-3 usually use node-2 as donor, not node-1.

When I force the node started with galera_new_cluster as the default donor with wsrep_sst_donor=node-1 I get the correct new 200 value on all nodes though.

So somehow nodes seem to remember the previous value and pass that on to joiners instead of the value they received from their own donor, or read from their configuration file.

Attachments

Issue Links

relates to

MDEV-25115 Changes to wsrep_gtid_domain_id in my.cnf are ignored on node restart

Closed

MDEV-28015 Mariabackup | GTID value is missing, Galera Cluster , adding async slave to it

Closed

Activity

Ascending order - Click to sort in descending order

Ramesh Sivaraman added a comment - 2022-08-04 06:41

Reproduced the issue, if we set the secondary node as the donor node the previous wsrep_gtid_domain_id is selected when restarting the joiner node.
Node1

MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;

+------------------------+-------------------+

| @@wsrep_gtid_domain_id | @@wsrep_node_name |

+------------------------+-------------------+

|                    200 | galera-node1      |

+------------------------+-------------------+

1 row in set (0.000 sec)

MariaDB [(none)]>

Node2

MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;

+------------------------+-------------------+

| @@wsrep_gtid_domain_id | @@wsrep_node_name |

+------------------------+-------------------+

|                    200 | galera-node2      |

+------------------------+-------------------+

1 row in set (0.001 sec)

MariaDB [(none)]>

Node3

MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;

+------------------------+-------------------+

| @@wsrep_gtid_domain_id | @@wsrep_node_name |

+------------------------+-------------------+

|                      7 | galera-node3      |

+------------------------+-------------------+

1 row in set (0.000 sec)

MariaDB [(none)]>

MariaDB [(none)]> select variable_name, global_value, global_value_origin, global_value_path from information_schema.system_variables where variable_name='WSREP_GTID_DOMAIN_ID';

+----------------------+--------------+---------------------+-------------------+

| variable_name        | global_value | global_value_origin | global_value_path |

+----------------------+--------------+---------------------+-------------------+

| WSREP_GTID_DOMAIN_ID | 7            | CONFIG              | /etc/mysql/my.cnf |

+----------------------+--------------+---------------------+-------------------+

1 row in set (0.002 sec)

MariaDB [(none)]>  \q

Bye

$ sudo grep wsrep_gtid_domain_id /etc/mysql/my.cnf

wsrep_gtid_domain_id = 200

$ sudo grep wsrep_sst_donor /etc/mysql/my.cnf

wsrep_sst_donor=192.168.100.20

If we force SST after setting secondary node as donor node restarting node 3 will fail.

2022-08-04  6:31:32 1 [Note] WSREP: Server status change connected -> joiner

2022-08-04  6:31:32 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2022-08-04  6:31:32 0 [Note] WSREP: Joiner monitor thread started to monitor

2022-08-04  6:31:32 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.100.30' --datadir '/var/lib/mysql/' --parent '706110' --binlog '/var/lib/mysql/master-bin' --binlog-index '/v

ar/lib/mysql/master-bin' --mysqld-args --wsrep_start_position=adf2999d-0c9a-11ed-aeea-0349b3a815aa:156,7-8-12'

WSREP_SST: [INFO] mariabackup SST started on joiner (20220804 06:31:32.077)

WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20220804 06:31:32.119)

WSREP_SST: [INFO] Streaming with mbstream (20220804 06:31:32.222)

WSREP_SST: [INFO] Using socat as streamer (20220804 06:31:32.224)

WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u TCP-LISTEN:4444,reuseaddr stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} ) (20220804 06:31:32.256)

2022-08-04  6:31:32 1 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 158, STRv: 3

2022-08-04  6:31:32 1 [Note] WSREP: IST receiver addr using tcp://192.168.100.30:4568

2022-08-04  6:31:32 1 [Note] WSREP: Prepared IST receiver for 0-158, listening at: tcp://192.168.100.30:4568

2022-08-04  6:31:32 0 [Warning] WSREP: Member 0.0 (galera-node3) requested state transfer from '192.168.100.20', but it is impossible to select State Transfer donor: No route to host

2022-08-04  6:31:32 1 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)

2022-08-04  6:31:32 1 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.

2022-08-04  6:31:32 1 [Note] WSREP: ReplicatorSMM::abort()

Ramesh Sivaraman added a comment - 2022-08-04 06:41 Reproduced the issue, if we set the secondary node as the donor node the previous wsrep_gtid_domain_id is selected when restarting the joiner node. Node1 MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name; +------------------------+-------------------+ | @@wsrep_gtid_domain_id | @@wsrep_node_name | +------------------------+-------------------+ | 200 | galera-node1 | +------------------------+-------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> Node2 MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name; +------------------------+-------------------+ | @@wsrep_gtid_domain_id | @@wsrep_node_name | +------------------------+-------------------+ | 200 | galera-node2 | +------------------------+-------------------+ 1 row in set (0.001 sec) MariaDB [(none)]> Node3 MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name; +------------------------+-------------------+ | @@wsrep_gtid_domain_id | @@wsrep_node_name | +------------------------+-------------------+ | 7 | galera-node3 | +------------------------+-------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> MariaDB [(none)]> select variable_name, global_value, global_value_origin, global_value_path from information_schema.system_variables where variable_name='WSREP_GTID_DOMAIN_ID'; +----------------------+--------------+---------------------+-------------------+ | variable_name | global_value | global_value_origin | global_value_path | +----------------------+--------------+---------------------+-------------------+ | WSREP_GTID_DOMAIN_ID | 7 | CONFIG | /etc/mysql/my.cnf | +----------------------+--------------+---------------------+-------------------+ 1 row in set (0.002 sec) MariaDB [(none)]> \q Bye $ sudo grep wsrep_gtid_domain_id /etc/mysql/my.cnf wsrep_gtid_domain_id = 200 $ $ sudo grep wsrep_sst_donor /etc/mysql/my.cnf wsrep_sst_donor=192.168.100.20 $ If we force SST after setting secondary node as donor node restarting node 3 will fail. 2022-08-04 6:31:32 1 [Note] WSREP: Server status change connected -> joiner 2022-08-04 6:31:32 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-08-04 6:31:32 0 [Note] WSREP: Joiner monitor thread started to monitor 2022-08-04 6:31:32 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.100.30' --datadir '/var/lib/mysql/' --parent '706110' --binlog '/var/lib/mysql/master-bin' --binlog-index '/v ar/lib/mysql/master-bin' --mysqld-args --wsrep_start_position=adf2999d-0c9a-11ed-aeea-0349b3a815aa:156,7-8-12' WSREP_SST: [INFO] mariabackup SST started on joiner (20220804 06:31:32.077) WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20220804 06:31:32.119) WSREP_SST: [INFO] Streaming with mbstream (20220804 06:31:32.222) WSREP_SST: [INFO] Using socat as streamer (20220804 06:31:32.224) WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u TCP-LISTEN:4444,reuseaddr stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} ) (20220804 06:31:32.256) 2022-08-04 6:31:32 1 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 158, STRv: 3 2022-08-04 6:31:32 1 [Note] WSREP: IST receiver addr using tcp://192.168.100.30:4568 2022-08-04 6:31:32 1 [Note] WSREP: Prepared IST receiver for 0-158, listening at: tcp://192.168.100.30:4568 2022-08-04 6:31:32 0 [Warning] WSREP: Member 0.0 (galera-node3) requested state transfer from '192.168.100.20', but it is impossible to select State Transfer donor: No route to host 2022-08-04 6:31:32 1 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host) 2022-08-04 6:31:32 1 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required. 2022-08-04 6:31:32 1 [Note] WSREP: ReplicatorSMM::abort()

Ramesh Sivaraman added a comment - 2022-08-04 10:29 - edited

jplindst SST failure is not due to wsrep_gtid_domain_id issue. wsrep_sst_donor must be a node name to trigger SST, not the IP address of a donor node. And wsrep_gtid_domain_id change works fine if the restart trigger SST.

Ramesh Sivaraman added a comment - 2022-08-04 10:29 - edited jplindst SST failure is not due to wsrep_gtid_domain_id issue. wsrep_sst_donor must be a node name to trigger SST, not the IP address of a donor node. And wsrep_gtid_domain_id change works fine if the restart trigger SST.

Jan Lindström (Inactive) added a comment - 2022-08-05 05:18

Change on wsrep_gtid_domain_id does not reflect on node if node uses IST when joining the cluster.

Jan Lindström (Inactive) added a comment - 2022-08-05 05:18 Change on wsrep_gtid_domain_id does not reflect on node if node uses IST when joining the cluster.

Hartmut Holzgraefe added a comment - 2022-08-05 05:47

Bu this was a fresh cluster due to galera_new_cluster, and also the domain id changed fine when forcing all nodes to use the very first one as donor, only when using a different one as donor, e.g. node 3 using the one started 2nd, not 1st, it failed to pick up the correct domain id

Hartmut Holzgraefe added a comment - 2022-08-05 05:47 Bu this was a fresh cluster due to galera_new_cluster, and also the domain id changed fine when forcing all nodes to use the very first one as donor, only when using a different one as donor, e.g. node 3 using the one started 2nd, not 1st, it failed to pick up the correct domain id

Jan Lindström (Inactive) added a comment - 2022-08-05 05:49

hholzgra Was datadir on node3 empty? If not just delete gcache file and force SST.

Jan Lindström (Inactive) added a comment - 2022-08-05 05:49 hholzgra Was datadir on node3 empty? If not just delete gcache file and force SST.

Jan Lindström (Inactive) added a comment - 2022-10-24 09:14

ramesh Based on Hartmut comments, could you please try to reproduce?

Jan Lindström (Inactive) added a comment - 2022-10-24 09:14 ramesh Based on Hartmut comments, could you please try to reproduce?

Hartmut Holzgraefe added a comment - 2022-10-24 09:20

I still have my original VM test setup for this somewhere, "just" need to figure out where ...

Hartmut Holzgraefe added a comment - 2022-10-24 09:20 I still have my original VM test setup for this somewhere, "just" need to figure out where ...

Ramesh Sivaraman added a comment - 2022-10-25 06:45

hholzgra The wsrep_gtid_domain_id is only changed by SST when the secondary node is used as a donor node. Can you confirm if node3 has used SST or IST to join the cluster.

Ramesh Sivaraman added a comment - 2022-10-25 06:45 hholzgra The wsrep_gtid_domain_id is only changed by SST when the secondary node is used as a donor node. Can you confirm if node3 has used SST or IST to join the cluster.

Hartmut Holzgraefe added a comment - 2022-11-24 12:20

How to reproduce:

start cluster of at least three nodes with

wsrep_gtid_domain_id=100

Shut down all nodes, node 1 last. Change configuration to

wsrep_gtid_domain_id=200

Start node 1 with galera-new-cluster, then start up the remaining nodes with systemctl start mariabd

Run

mysql -e "show variables like 'wsrep_gtid_domain_id'"

on all nodes, see that first and second node show the correct value 200, but later nodes that join using a different donor than node 1 show the old value 100

When enforcing an SST on one of the later nodes (node 3 and later) it will show the correct value 200 regardless of the actual donor node picked, but when then restarting the node after SST startup has completed and a different node than the first one is picked for SST (eg. by enforcing that with wsrep_sst_donor=node-2) it will flip back to the old value 100 again.

When enforcing an SST on all but the first node by purging the data directory the correct value 200 is picked up by all nodes, and persists over later restarts, too.

So we should either clearly document the procedure necessary to change wsrep_gtid_domain_id, or – preferred – figure out and fix the behavior so that changing the domain id is possible without enforced SST on all but the first node.

Hartmut Holzgraefe added a comment - 2022-11-24 12:20 How to reproduce: start cluster of at least three nodes with wsrep_gtid_domain_id=100 Shut down all nodes, node 1 last. Change configuration to wsrep_gtid_domain_id=200 Start node 1 with galera-new-cluster , then start up the remaining nodes with systemctl start mariabd Run mysql -e "show variables like 'wsrep_gtid_domain_id'" on all nodes, see that first and second node show the correct value 200, but later nodes that join using a different donor than node 1 show the old value 100 When enforcing an SST on one of the later nodes (node 3 and later) it will show the correct value 200 regardless of the actual donor node picked, but when then restarting the node after SST startup has completed and a different node than the first one is picked for SST (eg. by enforcing that with wsrep_sst_donor=node-2 ) it will flip back to the old value 100 again. When enforcing an SST on all but the first node by purging the data directory the correct value 200 is picked up by all nodes, and persists over later restarts, too. So we should either clearly document the procedure necessary to change wsrep_gtid_domain_id, or – preferred – figure out and fix the behavior so that changing the domain id is possible without enforced SST on all but the first node.

Ramesh Sivaraman added a comment - 2022-11-26 06:59 - edited

julien.fritsch jplindst The problem has been reproduced. Even after SST, the wsrep code always chooses the old wsrep_gtid_domain_id if the donor is a non-bootstrap node when the joiner is restarted.
Please check the --gtid-domain-id value on IST log
xtrabackup IST info
When Node1 becomes the donor (--gtid-domain-id is 200)

2022-11-26  5:27:30 1 [Note] WSREP: Server status change synced -> donor

2022-11-26  5:27:30 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2022-11-26  5:27:30 0 [Note] WSREP: Donor monitor thread started to monitor

2022-11-26  5:27:30 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.100.30:4444/xtrabackup_sst//1' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 0 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:61' --gtid-domain-id 200 --bypass --mysqld-args --wsrep-new-cluster --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:8,100-101-3'

2022-11-26  5:27:30 1 [Note] WSREP: sst_donor_thread signaled with 0

2022-11-26  5:27:30 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 62-63, preload starts from 63

2022-11-26  5:27:30 0 [Note] WSREP: IST sender 62 -> 63

When Node2 becomes the donor (--gtid-domain-id is 100)

2022-11-26  5:04:02 1 [Note] WSREP: Server status change synced -> donor

2022-11-26  5:04:02 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2022-11-26  5:04:02 0 [Note] WSREP: Donor monitor thread started to monitor

2022-11-26  5:04:02 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.100.30:4444/xtrabackup_sst//1' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 0 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:55' --gtid-domain-id 100 --bypass --mysqld-args --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:7,100-101-3'

2022-11-26  5:04:02 1 [Note] WSREP: sst_donor_thread signaled with 0

2022-11-26  5:04:02 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 56-57, preload starts from 57

2022-11-26  5:04:02 0 [Note] WSREP: IST sender 56 -> 57

rsync IST info

When Node1 becomes the donor

2022-11-26  5:36:23 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.100.30:4444/rsync_sst' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 1 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:69' --gtid-domain-id 200 --bypass --mysqld-args --wsrep-new-cluster --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:66,200-101-0'

2022-11-26  5:36:23 1 [Note] WSREP: sst_donor_thread signaled with 0

2022-11-26  5:36:23 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 70-71, preload starts from 71

2022-11-26  5:36:23 0 [Note] WSREP: IST sender 70 -> 71

When Node2 becomes the donor.

2022-11-26  5:35:18 1 [Note] WSREP: Server status change synced -> donor

2022-11-26  5:35:18 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2022-11-26  5:35:18 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.100.30:4444/rsync_sst' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 1 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:63' --gtid-domain-id 100 --bypass --mysqld-args --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:64,100-101-3'

2022-11-26  5:35:18 0 [Note] WSREP: Donor monitor thread started to monitor

2022-11-26  5:35:18 1 [Note] WSREP: sst_donor_thread signaled with 0

2022-11-26  5:35:18 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 64-69, preload starts from 69

2022-11-26  5:35:18 0 [Note] WSREP: IST sender 64 -> 69

This issue causes the GTID sequence to be incorrectly generated on the cluster nodes. If we generate any transaction on node3 then gtid_binlog_pos on node3 and node1 will use new wsrep_domain_id but replicated event on node2 will select old wsrep_gtid_domain_id.

Node3

MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;

+-------------------+--------------------+------------------------+

| @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |

+-------------------+--------------------+------------------------+

| 200-101-2         | 200-101-2          |                    100 |

+-------------------+--------------------+------------------------+

1 row in set (0.000 sec)

MariaDB [(none)]>

Node1

MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;

+-------------------+--------------------+------------------------+

| @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |

+-------------------+--------------------+------------------------+

| 200-101-2         | 200-101-2          |                    200 |

+-------------------+--------------------+------------------------+

1 row in set (0.000 sec)

MariaDB [(none)]>

Node2

MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;

+-------------------+--------------------+------------------------+

| @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |

+-------------------+--------------------+------------------------+

| 100-101-5         | 100-101-5          |                    200 |

+-------------------+--------------------+------------------------+

1 row in set (0.001 sec)

MariaDB [(none)]>

{code:sql}

Similarly if we create any transaction on node2 then gtid_binlog_pos on node3 and node2 will use old wsrep_domain_id but replicated transaction on node1 will select new wsrep_gtid_domain_id.

*Node2*

{code:sql}

MariaDB [(none)]> create database db1;

Query OK, 1 row affected (0.006 sec)

MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;

+-------------------+--------------------+------------------------+

| @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |

+-------------------+--------------------+------------------------+

| 100-101-3         | 100-101-3          |                    200 |

+-------------------+--------------------+------------------------+

1 row in set (0.000 sec)

MariaDB [(none)]>

Node3

MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;

+-------------------+--------------------+------------------------+

| @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |

+-------------------+--------------------+------------------------+

| 100-101-3         | 100-101-3          |                    100 |

+-------------------+--------------------+------------------------+

1 row in set (0.000 sec)

MariaDB [(none)]>

Node 1

MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;

+-------------------+--------------------+------------------------+

| @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |

+-------------------+--------------------+------------------------+

| 200-101-1         | 200-101-1          |                    200 |

+-------------------+--------------------+------------------------+

1 row in set (0.000 sec)

MariaDB [(none)]>

Ramesh Sivaraman added a comment - 2022-11-26 06:59 - edited julien.fritsch jplindst The problem has been reproduced. Even after SST, the wsrep code always chooses the old wsrep_gtid_domain_id if the donor is a non-bootstrap node when the joiner is restarted. Please check the --gtid-domain-id value on IST log xtrabackup IST info When Node1 becomes the donor (--gtid-domain-id is 200) 2022-11-26 5:27:30 1 [Note] WSREP: Server status change synced -> donor 2022-11-26 5:27:30 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-11-26 5:27:30 0 [Note] WSREP: Donor monitor thread started to monitor 2022-11-26 5:27:30 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.100.30:4444/xtrabackup_sst//1' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 0 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:61' --gtid-domain-id 200 --bypass --mysqld-args --wsrep-new-cluster --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:8,100-101-3' 2022-11-26 5:27:30 1 [Note] WSREP: sst_donor_thread signaled with 0 2022-11-26 5:27:30 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 62-63, preload starts from 63 2022-11-26 5:27:30 0 [Note] WSREP: IST sender 62 -> 63 When Node2 becomes the donor (--gtid-domain-id is 100) 2022-11-26 5:04:02 1 [Note] WSREP: Server status change synced -> donor 2022-11-26 5:04:02 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-11-26 5:04:02 0 [Note] WSREP: Donor monitor thread started to monitor 2022-11-26 5:04:02 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.100.30:4444/xtrabackup_sst//1' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 0 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:55' --gtid-domain-id 100 --bypass --mysqld-args --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:7,100-101-3' 2022-11-26 5:04:02 1 [Note] WSREP: sst_donor_thread signaled with 0 2022-11-26 5:04:02 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 56-57, preload starts from 57 2022-11-26 5:04:02 0 [Note] WSREP: IST sender 56 -> 57 rsync IST info When Node1 becomes the donor 2022-11-26 5:36:23 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.100.30:4444/rsync_sst' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 1 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:69' --gtid-domain-id 200 --bypass --mysqld-args --wsrep-new-cluster --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:66,200-101-0' 2022-11-26 5:36:23 1 [Note] WSREP: sst_donor_thread signaled with 0 2022-11-26 5:36:23 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 70-71, preload starts from 71 2022-11-26 5:36:23 0 [Note] WSREP: IST sender 70 -> 71 When Node2 becomes the donor. 2022-11-26 5:35:18 1 [Note] WSREP: Server status change synced -> donor 2022-11-26 5:35:18 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-11-26 5:35:18 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.100.30:4444/rsync_sst' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 1 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:63' --gtid-domain-id 100 --bypass --mysqld-args --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:64,100-101-3' 2022-11-26 5:35:18 0 [Note] WSREP: Donor monitor thread started to monitor 2022-11-26 5:35:18 1 [Note] WSREP: sst_donor_thread signaled with 0 2022-11-26 5:35:18 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 64-69, preload starts from 69 2022-11-26 5:35:18 0 [Note] WSREP: IST sender 64 -> 69 This issue causes the GTID sequence to be incorrectly generated on the cluster nodes. If we generate any transaction on node3 then gtid_binlog_pos on node3 and node1 will use new wsrep_domain_id but replicated event on node2 will select old wsrep_gtid_domain_id. Node3 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 200-101-2 | 200-101-2 | 100 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> Node1 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 200-101-2 | 200-101-2 | 200 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> Node2 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 100-101-5 | 100-101-5 | 200 | + -------------------+--------------------+------------------------+ 1 row in set (0.001 sec) MariaDB [(none)]> {code:sql} Similarly if we create any transaction on node2 then gtid_binlog_pos on node3 and node2 will use old wsrep_domain_id but replicated transaction on node1 will select new wsrep_gtid_domain_id. *Node2* {code:sql} MariaDB [(none)]> create database db1; Query OK, 1 row affected (0.006 sec) MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 100-101-3 | 100-101-3 | 200 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> Node3 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 100-101-3 | 100-101-3 | 100 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> Node 1 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 200-101-1 | 200-101-1 | 200 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]>

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Hartmut Holzgraefe

Votes:: 4 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2022-07-26 15:43

Updated:: 2024-09-24 12:30

Resolved:: 2023-01-17 12:38

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server