Details
Description
I know that changing wsrep_gtid_domain_id in the configuration file on all nodes requires a full cluster stop and restart to pick up the change as only the first node will actually use the config file value, while all further nodes receive the value to use from their donor during IST/SST.
But when e.g. changing the relevant configuration settings from
wsrep-gtid-mode=ON
|
wsrep_gtid_domain_id=100
|
gtid_domain_id=...different value per node....
|
to
wsrep_gtid_domain_id=200
|
then stopping all nodes, staring node-1 with galera_new_cluster, and node-2 with systemctl start mariadb, all is fine so far, both nodes show wsrep_gtid_domain_id=200 in SHOW VARIABLES LIKE 'wsrep_gtid_domain_id'. But when starting a 3rd or further nodes they all usually still show 100 instead of 200.
Looking at the error logs I see node-3 usually use node-2 as donor, not node-1.
When I force the node started with galera_new_cluster as the default donor with wsrep_sst_donor=node-1 I get the correct new 200 value on all nodes though.
So somehow nodes seem to remember the previous value and pass that on to joiners instead of the value they received from their own donor, or read from their configuration file.
Attachments
Issue Links
- relates to
-
MDEV-25115 Changes to wsrep_gtid_domain_id in my.cnf are ignored on node restart
-
- Closed
-
-
MDEV-28015 Mariabackup | GTID value is missing, Galera Cluster , adding async slave to it
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue relates to MENT-1230 [ MENT-1230 ] |
Link |
This issue relates to |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.8 [ 26121 ] |
Assignee | Jan Lindström [ jplindst ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Summary | chainging the value of wsrep_gtid_domain_id with full cluster restart fails on some nodes | changing the value of wsrep_gtid_domain_id with full cluster restart fails on some nodes |
Status | Confirmed [ 10101 ] | Open [ 1 ] |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Resolution | Incomplete [ 4 ] | |
Status | Needs Feedback [ 10501 ] | Closed [ 6 ] |
Resolution | Incomplete [ 4 ] | |
Status | Closed [ 6 ] | Stalled [ 10000 ] |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | N/A [ 14700 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Jan Lindström [ jplindst ] | Daniele Sciascia [ sciascid ] |
Assignee | Daniele Sciascia [ sciascid ] | Jan Lindström [ jplindst ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Assignee | Jan Lindström [ jplindst ] | Daniele Sciascia [ sciascid ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Daniele Sciascia [ sciascid ] | Jan Lindström [ jplindst ] |
Assignee | Jan Lindström [ jplindst ] | Julien Fritsch [ julien.fritsch ] |
Assignee | Julien Fritsch [ julien.fritsch ] | Jan Lindström [ jplindst ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
issue.field.resolutiondate | 2023-01-17 12:38:42.0 | 2023-01-17 12:38:42.614 |
Fix Version/s | 10.5.19 [ 28511 ] | |
Fix Version/s | 10.6.12 [ 28513 ] | |
Fix Version/s | 10.7.8 [ 28515 ] | |
Fix Version/s | 10.8.7 [ 28517 ] | |
Fix Version/s | 10.9.5 [ 28519 ] | |
Fix Version/s | 10.10.3 [ 28521 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue relates to |
Zendesk Related Tickets | 121570 153488 |
Reproduced the issue, if we set the secondary node as the donor node the previous wsrep_gtid_domain_id is selected when restarting the joiner node.
Node1
MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;
+------------------------+-------------------+
| @@wsrep_gtid_domain_id | @@wsrep_node_name |
+------------------------+-------------------+
| 200 | galera-node1 |
+------------------------+-------------------+
1 row in set (0.000 sec)
MariaDB [(none)]>
Node2
MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;
+------------------------+-------------------+
| @@wsrep_gtid_domain_id | @@wsrep_node_name |
+------------------------+-------------------+
| 200 | galera-node2 |
+------------------------+-------------------+
1 row in set (0.001 sec)
MariaDB [(none)]>
Node3
MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;
+------------------------+-------------------+
| @@wsrep_gtid_domain_id | @@wsrep_node_name |
+------------------------+-------------------+
| 7 | galera-node3 |
+------------------------+-------------------+
1 row in set (0.000 sec)
MariaDB [(none)]>
MariaDB [(none)]> select variable_name, global_value, global_value_origin, global_value_path from information_schema.system_variables where variable_name='WSREP_GTID_DOMAIN_ID';
+----------------------+--------------+---------------------+-------------------+
| variable_name | global_value | global_value_origin | global_value_path |
+----------------------+--------------+---------------------+-------------------+
| WSREP_GTID_DOMAIN_ID | 7 | CONFIG | /etc/mysql/my.cnf |
+----------------------+--------------+---------------------+-------------------+
1 row in set (0.002 sec)
MariaDB [(none)]> \q
Bye
$ sudo grep wsrep_gtid_domain_id /etc/mysql/my.cnf
wsrep_gtid_domain_id = 200
$
$ sudo grep wsrep_sst_donor /etc/mysql/my.cnf
wsrep_sst_donor=192.168.100.20
$
If we force SST after setting secondary node as donor node restarting node 3 will fail.
2022-08-04 6:31:32 1 [Note] WSREP: Server status change connected -> joiner
2022-08-04 6:31:32 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2022-08-04 6:31:32 0 [Note] WSREP: Joiner monitor thread started to monitor
2022-08-04 6:31:32 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.100.30' --datadir '/var/lib/mysql/' --parent '706110' --binlog '/var/lib/mysql/master-bin' --binlog-index '/v
ar/lib/mysql/master-bin' --mysqld-args --wsrep_start_position=adf2999d-0c9a-11ed-aeea-0349b3a815aa:156,7-8-12'
WSREP_SST: [INFO] mariabackup SST started on joiner (20220804 06:31:32.077)
WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20220804 06:31:32.119)
WSREP_SST: [INFO] Streaming with mbstream (20220804 06:31:32.222)
WSREP_SST: [INFO] Using socat as streamer (20220804 06:31:32.224)
WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u TCP-LISTEN:4444,reuseaddr stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} ) (20220804 06:31:32.256)
2022-08-04 6:31:32 1 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 158, STRv: 3
2022-08-04 6:31:32 1 [Note] WSREP: IST receiver addr using tcp://192.168.100.30:4568
2022-08-04 6:31:32 1 [Note] WSREP: Prepared IST receiver for 0-158, listening at: tcp://192.168.100.30:4568
2022-08-04 6:31:32 0 [Warning] WSREP: Member 0.0 (galera-node3) requested state transfer from '192.168.100.20', but it is impossible to select State Transfer donor: No route to host
2022-08-04 6:31:32 1 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2022-08-04 6:31:32 1 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2022-08-04 6:31:32 1 [Note] WSREP: ReplicatorSMM::abort()