Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29171

changing the value of wsrep_gtid_domain_id with full cluster restart fails on some nodes

Details

    Description

      I know that changing wsrep_gtid_domain_id in the configuration file on all nodes requires a full cluster stop and restart to pick up the change as only the first node will actually use the config file value, while all further nodes receive the value to use from their donor during IST/SST.

      But when e.g. changing the relevant configuration settings from

      wsrep-gtid-mode=ON
      wsrep_gtid_domain_id=100
      gtid_domain_id=...different value per node....
      

      to

      wsrep_gtid_domain_id=200
      

      then stopping all nodes, staring node-1 with galera_new_cluster, and node-2 with systemctl start mariadb, all is fine so far, both nodes show wsrep_gtid_domain_id=200 in SHOW VARIABLES LIKE 'wsrep_gtid_domain_id'. But when starting a 3rd or further nodes they all usually still show 100 instead of 200.

      Looking at the error logs I see node-3 usually use node-2 as donor, not node-1.

      When I force the node started with galera_new_cluster as the default donor with wsrep_sst_donor=node-1 I get the correct new 200 value on all nodes though.

      So somehow nodes seem to remember the previous value and pass that on to joiners instead of the value they received from their own donor, or read from their configuration file.

      Attachments

        Issue Links

          Activity

            Reproduced the issue, if we set the secondary node as the donor node the previous wsrep_gtid_domain_id is selected when restarting the joiner node.
            Node1

            MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;
            +------------------------+-------------------+
            | @@wsrep_gtid_domain_id | @@wsrep_node_name |
            +------------------------+-------------------+
            |                    200 | galera-node1      |
            +------------------------+-------------------+
            1 row in set (0.000 sec)
             
            MariaDB [(none)]> 
            

            Node2

            MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;
            +------------------------+-------------------+
            | @@wsrep_gtid_domain_id | @@wsrep_node_name |
            +------------------------+-------------------+
            |                    200 | galera-node2      |
            +------------------------+-------------------+
            1 row in set (0.001 sec)
             
            MariaDB [(none)]> 
            

            Node3

            MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name;
            +------------------------+-------------------+
            | @@wsrep_gtid_domain_id | @@wsrep_node_name |
            +------------------------+-------------------+
            |                      7 | galera-node3      |
            +------------------------+-------------------+
            1 row in set (0.000 sec)
             
            MariaDB [(none)]>
            MariaDB [(none)]> select variable_name, global_value, global_value_origin, global_value_path from information_schema.system_variables where variable_name='WSREP_GTID_DOMAIN_ID';
            +----------------------+--------------+---------------------+-------------------+
            | variable_name        | global_value | global_value_origin | global_value_path |
            +----------------------+--------------+---------------------+-------------------+
            | WSREP_GTID_DOMAIN_ID | 7            | CONFIG              | /etc/mysql/my.cnf |
            +----------------------+--------------+---------------------+-------------------+
            1 row in set (0.002 sec)
             
            MariaDB [(none)]>  \q
            Bye
            $ sudo grep wsrep_gtid_domain_id /etc/mysql/my.cnf 
            wsrep_gtid_domain_id = 200
            $ 
            $ sudo grep wsrep_sst_donor /etc/mysql/my.cnf 
            wsrep_sst_donor=192.168.100.20
            $
            

            If we force SST after setting secondary node as donor node restarting node 3 will fail.

            2022-08-04  6:31:32 1 [Note] WSREP: Server status change connected -> joiner
            2022-08-04  6:31:32 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2022-08-04  6:31:32 0 [Note] WSREP: Joiner monitor thread started to monitor
            2022-08-04  6:31:32 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.100.30' --datadir '/var/lib/mysql/' --parent '706110' --binlog '/var/lib/mysql/master-bin' --binlog-index '/v
            ar/lib/mysql/master-bin' --mysqld-args --wsrep_start_position=adf2999d-0c9a-11ed-aeea-0349b3a815aa:156,7-8-12'
            WSREP_SST: [INFO] mariabackup SST started on joiner (20220804 06:31:32.077)
            WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20220804 06:31:32.119)
            WSREP_SST: [INFO] Streaming with mbstream (20220804 06:31:32.222)
            WSREP_SST: [INFO] Using socat as streamer (20220804 06:31:32.224)
            WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u TCP-LISTEN:4444,reuseaddr stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} ) (20220804 06:31:32.256)
            2022-08-04  6:31:32 1 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 158, STRv: 3
            2022-08-04  6:31:32 1 [Note] WSREP: IST receiver addr using tcp://192.168.100.30:4568
            2022-08-04  6:31:32 1 [Note] WSREP: Prepared IST receiver for 0-158, listening at: tcp://192.168.100.30:4568
            2022-08-04  6:31:32 0 [Warning] WSREP: Member 0.0 (galera-node3) requested state transfer from '192.168.100.20', but it is impossible to select State Transfer donor: No route to host
            2022-08-04  6:31:32 1 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
            2022-08-04  6:31:32 1 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
            2022-08-04  6:31:32 1 [Note] WSREP: ReplicatorSMM::abort()
            

            ramesh Ramesh Sivaraman added a comment - Reproduced the issue, if we set the secondary node as the donor node the previous wsrep_gtid_domain_id is selected when restarting the joiner node. Node1 MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name; +------------------------+-------------------+ | @@wsrep_gtid_domain_id | @@wsrep_node_name | +------------------------+-------------------+ | 200 | galera-node1 | +------------------------+-------------------+ 1 row in set (0.000 sec)   MariaDB [(none)]> Node2 MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name; +------------------------+-------------------+ | @@wsrep_gtid_domain_id | @@wsrep_node_name | +------------------------+-------------------+ | 200 | galera-node2 | +------------------------+-------------------+ 1 row in set (0.001 sec)   MariaDB [(none)]> Node3 MariaDB [(none)]> select @@wsrep_gtid_domain_id,@@wsrep_node_name; +------------------------+-------------------+ | @@wsrep_gtid_domain_id | @@wsrep_node_name | +------------------------+-------------------+ | 7 | galera-node3 | +------------------------+-------------------+ 1 row in set (0.000 sec)   MariaDB [(none)]> MariaDB [(none)]> select variable_name, global_value, global_value_origin, global_value_path from information_schema.system_variables where variable_name='WSREP_GTID_DOMAIN_ID'; +----------------------+--------------+---------------------+-------------------+ | variable_name | global_value | global_value_origin | global_value_path | +----------------------+--------------+---------------------+-------------------+ | WSREP_GTID_DOMAIN_ID | 7 | CONFIG | /etc/mysql/my.cnf | +----------------------+--------------+---------------------+-------------------+ 1 row in set (0.002 sec)   MariaDB [(none)]> \q Bye $ sudo grep wsrep_gtid_domain_id /etc/mysql/my.cnf wsrep_gtid_domain_id = 200 $ $ sudo grep wsrep_sst_donor /etc/mysql/my.cnf wsrep_sst_donor=192.168.100.20 $ If we force SST after setting secondary node as donor node restarting node 3 will fail. 2022-08-04 6:31:32 1 [Note] WSREP: Server status change connected -> joiner 2022-08-04 6:31:32 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-08-04 6:31:32 0 [Note] WSREP: Joiner monitor thread started to monitor 2022-08-04 6:31:32 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.100.30' --datadir '/var/lib/mysql/' --parent '706110' --binlog '/var/lib/mysql/master-bin' --binlog-index '/v ar/lib/mysql/master-bin' --mysqld-args --wsrep_start_position=adf2999d-0c9a-11ed-aeea-0349b3a815aa:156,7-8-12' WSREP_SST: [INFO] mariabackup SST started on joiner (20220804 06:31:32.077) WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20220804 06:31:32.119) WSREP_SST: [INFO] Streaming with mbstream (20220804 06:31:32.222) WSREP_SST: [INFO] Using socat as streamer (20220804 06:31:32.224) WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u TCP-LISTEN:4444,reuseaddr stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} ) (20220804 06:31:32.256) 2022-08-04 6:31:32 1 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 158, STRv: 3 2022-08-04 6:31:32 1 [Note] WSREP: IST receiver addr using tcp://192.168.100.30:4568 2022-08-04 6:31:32 1 [Note] WSREP: Prepared IST receiver for 0-158, listening at: tcp://192.168.100.30:4568 2022-08-04 6:31:32 0 [Warning] WSREP: Member 0.0 (galera-node3) requested state transfer from '192.168.100.20', but it is impossible to select State Transfer donor: No route to host 2022-08-04 6:31:32 1 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host) 2022-08-04 6:31:32 1 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required. 2022-08-04 6:31:32 1 [Note] WSREP: ReplicatorSMM::abort()
            ramesh Ramesh Sivaraman added a comment - - edited

            jplindst SST failure is not due to wsrep_gtid_domain_id issue. wsrep_sst_donor must be a node name to trigger SST, not the IP address of a donor node. And wsrep_gtid_domain_id change works fine if the restart trigger SST.

            ramesh Ramesh Sivaraman added a comment - - edited jplindst SST failure is not due to wsrep_gtid_domain_id issue. wsrep_sst_donor must be a node name to trigger SST, not the IP address of a donor node. And wsrep_gtid_domain_id change works fine if the restart trigger SST.

            Change on wsrep_gtid_domain_id does not reflect on node if node uses IST when joining the cluster.

            jplindst Jan Lindström (Inactive) added a comment - Change on wsrep_gtid_domain_id does not reflect on node if node uses IST when joining the cluster.

            Bu this was a fresh cluster due to galera_new_cluster, and also the domain id changed fine when forcing all nodes to use the very first one as donor, only when using a different one as donor, e.g. node 3 using the one started 2nd, not 1st, it failed to pick up the correct domain id

            hholzgra Hartmut Holzgraefe added a comment - Bu this was a fresh cluster due to galera_new_cluster, and also the domain id changed fine when forcing all nodes to use the very first one as donor, only when using a different one as donor, e.g. node 3 using the one started 2nd, not 1st, it failed to pick up the correct domain id

            hholzgra Was datadir on node3 empty? If not just delete gcache file and force SST.

            jplindst Jan Lindström (Inactive) added a comment - hholzgra Was datadir on node3 empty? If not just delete gcache file and force SST.

            ramesh Based on Hartmut comments, could you please try to reproduce?

            jplindst Jan Lindström (Inactive) added a comment - ramesh Based on Hartmut comments, could you please try to reproduce?

            I still have my original VM test setup for this somewhere, "just" need to figure out where ...

            hholzgra Hartmut Holzgraefe added a comment - I still have my original VM test setup for this somewhere, "just" need to figure out where ...

            hholzgra The wsrep_gtid_domain_id is only changed by SST when the secondary node is used as a donor node. Can you confirm if node3 has used SST or IST to join the cluster.

            ramesh Ramesh Sivaraman added a comment - hholzgra The wsrep_gtid_domain_id is only changed by SST when the secondary node is used as a donor node. Can you confirm if node3 has used SST or IST to join the cluster.

            How to reproduce:

            start cluster of at least three nodes with

            wsrep_gtid_domain_id=100

            Shut down all nodes, node 1 last. Change configuration to

            wsrep_gtid_domain_id=200

            Start node 1 with galera-new-cluster, then start up the remaining nodes with systemctl start mariabd

            Run

            mysql -e "show variables like 'wsrep_gtid_domain_id'"

            on all nodes, see that first and second node show the correct value 200, but later nodes that join using a different donor than node 1 show the old value 100

            When enforcing an SST on one of the later nodes (node 3 and later) it will show the correct value 200 regardless of the actual donor node picked, but when then restarting the node after SST startup has completed and a different node than the first one is picked for SST (eg. by enforcing that with wsrep_sst_donor=node-2) it will flip back to the old value 100 again.

            When enforcing an SST on all but the first node by purging the data directory the correct value 200 is picked up by all nodes, and persists over later restarts, too.

            So we should either clearly document the procedure necessary to change wsrep_gtid_domain_id, or – preferred – figure out and fix the behavior so that changing the domain id is possible without enforced SST on all but the first node.

            hholzgra Hartmut Holzgraefe added a comment - How to reproduce: start cluster of at least three nodes with wsrep_gtid_domain_id=100 Shut down all nodes, node 1 last. Change configuration to wsrep_gtid_domain_id=200 Start node 1 with galera-new-cluster , then start up the remaining nodes with systemctl start mariabd Run mysql -e "show variables like 'wsrep_gtid_domain_id'" on all nodes, see that first and second node show the correct value 200, but later nodes that join using a different donor than node 1 show the old value 100 When enforcing an SST on one of the later nodes (node 3 and later) it will show the correct value 200 regardless of the actual donor node picked, but when then restarting the node after SST startup has completed and a different node than the first one is picked for SST (eg. by enforcing that with wsrep_sst_donor=node-2 ) it will flip back to the old value 100 again. When enforcing an SST on all but the first node by purging the data directory the correct value 200 is picked up by all nodes, and persists over later restarts, too. So we should either clearly document the procedure necessary to change wsrep_gtid_domain_id, or – preferred – figure out and fix the behavior so that changing the domain id is possible without enforced SST on all but the first node.
            ramesh Ramesh Sivaraman added a comment - - edited

            julien.fritsch jplindst The problem has been reproduced. Even after SST, the wsrep code always chooses the old wsrep_gtid_domain_id if the donor is a non-bootstrap node when the joiner is restarted.
            Please check the --gtid-domain-id value on IST log
            xtrabackup IST info
            When Node1 becomes the donor (--gtid-domain-id is 200)

            2022-11-26  5:27:30 1 [Note] WSREP: Server status change synced -> donor
            2022-11-26  5:27:30 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2022-11-26  5:27:30 0 [Note] WSREP: Donor monitor thread started to monitor
            2022-11-26  5:27:30 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.100.30:4444/xtrabackup_sst//1' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 0 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:61' --gtid-domain-id 200 --bypass --mysqld-args --wsrep-new-cluster --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:8,100-101-3'
            2022-11-26  5:27:30 1 [Note] WSREP: sst_donor_thread signaled with 0
            2022-11-26  5:27:30 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 62-63, preload starts from 63
            2022-11-26  5:27:30 0 [Note] WSREP: IST sender 62 -> 63
            

            When Node2 becomes the donor (--gtid-domain-id is 100)

            2022-11-26  5:04:02 1 [Note] WSREP: Server status change synced -> donor
            2022-11-26  5:04:02 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2022-11-26  5:04:02 0 [Note] WSREP: Donor monitor thread started to monitor
            2022-11-26  5:04:02 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.100.30:4444/xtrabackup_sst//1' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 0 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:55' --gtid-domain-id 100 --bypass --mysqld-args --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:7,100-101-3'
            2022-11-26  5:04:02 1 [Note] WSREP: sst_donor_thread signaled with 0
            2022-11-26  5:04:02 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 56-57, preload starts from 57
            2022-11-26  5:04:02 0 [Note] WSREP: IST sender 56 -> 57
            

            rsync IST info

            When Node1 becomes the donor

            2022-11-26  5:36:23 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.100.30:4444/rsync_sst' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 1 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:69' --gtid-domain-id 200 --bypass --mysqld-args --wsrep-new-cluster --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:66,200-101-0'
            2022-11-26  5:36:23 1 [Note] WSREP: sst_donor_thread signaled with 0
            2022-11-26  5:36:23 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 70-71, preload starts from 71
            2022-11-26  5:36:23 0 [Note] WSREP: IST sender 70 -> 71
            

            When Node2 becomes the donor.

            2022-11-26  5:35:18 1 [Note] WSREP: Server status change synced -> donor
            2022-11-26  5:35:18 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2022-11-26  5:35:18 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.100.30:4444/rsync_sst' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 1 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:63' --gtid-domain-id 100 --bypass --mysqld-args --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:64,100-101-3'
            2022-11-26  5:35:18 0 [Note] WSREP: Donor monitor thread started to monitor
            2022-11-26  5:35:18 1 [Note] WSREP: sst_donor_thread signaled with 0
            2022-11-26  5:35:18 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 64-69, preload starts from 69
            2022-11-26  5:35:18 0 [Note] WSREP: IST sender 64 -> 69
            

            This issue causes the GTID sequence to be incorrectly generated on the cluster nodes. If we generate any transaction on node3 then gtid_binlog_pos on node3 and node1 will use new wsrep_domain_id but replicated event on node2 will select old wsrep_gtid_domain_id.

            Node3

            MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;
            +-------------------+--------------------+------------------------+
            | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |
            +-------------------+--------------------+------------------------+
            | 200-101-2         | 200-101-2          |                    100 |
            +-------------------+--------------------+------------------------+
            1 row in set (0.000 sec)
             
            MariaDB [(none)]> 
            

            Node1

            MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;
            +-------------------+--------------------+------------------------+
            | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |
            +-------------------+--------------------+------------------------+
            | 200-101-2         | 200-101-2          |                    200 |
            +-------------------+--------------------+------------------------+
            1 row in set (0.000 sec)
             
            MariaDB [(none)]> 
            

            Node2

            MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;
            +-------------------+--------------------+------------------------+
            | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |
            +-------------------+--------------------+------------------------+
            | 100-101-5         | 100-101-5          |                    200 |
            +-------------------+--------------------+------------------------+
            1 row in set (0.001 sec)
             
            MariaDB [(none)]> 
            {code:sql}
             
            Similarly if we create any transaction on node2 then gtid_binlog_pos on node3 and node2 will use old wsrep_domain_id but replicated transaction on node1 will select new wsrep_gtid_domain_id.
            *Node2*
            {code:sql}
            MariaDB [(none)]> create database db1;
            Query OK, 1 row affected (0.006 sec)
             
            MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;
            +-------------------+--------------------+------------------------+
            | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |
            +-------------------+--------------------+------------------------+
            | 100-101-3         | 100-101-3          |                    200 |
            +-------------------+--------------------+------------------------+
            1 row in set (0.000 sec)
             
            MariaDB [(none)]> 
            

            Node3

            MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;
            +-------------------+--------------------+------------------------+
            | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |
            +-------------------+--------------------+------------------------+
            | 100-101-3         | 100-101-3          |                    100 |
            +-------------------+--------------------+------------------------+
            1 row in set (0.000 sec)
             
            MariaDB [(none)]> 
            

            Node 1

            MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id;
            +-------------------+--------------------+------------------------+
            | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id |
            +-------------------+--------------------+------------------------+
            | 200-101-1         | 200-101-1          |                    200 |
            +-------------------+--------------------+------------------------+
            1 row in set (0.000 sec)
             
            MariaDB [(none)]> 
            

            ramesh Ramesh Sivaraman added a comment - - edited julien.fritsch jplindst The problem has been reproduced. Even after SST, the wsrep code always chooses the old wsrep_gtid_domain_id if the donor is a non-bootstrap node when the joiner is restarted. Please check the --gtid-domain-id value on IST log xtrabackup IST info When Node1 becomes the donor (--gtid-domain-id is 200) 2022-11-26 5:27:30 1 [Note] WSREP: Server status change synced -> donor 2022-11-26 5:27:30 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-11-26 5:27:30 0 [Note] WSREP: Donor monitor thread started to monitor 2022-11-26 5:27:30 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.100.30:4444/xtrabackup_sst//1' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 0 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:61' --gtid-domain-id 200 --bypass --mysqld-args --wsrep-new-cluster --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:8,100-101-3' 2022-11-26 5:27:30 1 [Note] WSREP: sst_donor_thread signaled with 0 2022-11-26 5:27:30 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 62-63, preload starts from 63 2022-11-26 5:27:30 0 [Note] WSREP: IST sender 62 -> 63 When Node2 becomes the donor (--gtid-domain-id is 100) 2022-11-26 5:04:02 1 [Note] WSREP: Server status change synced -> donor 2022-11-26 5:04:02 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-11-26 5:04:02 0 [Note] WSREP: Donor monitor thread started to monitor 2022-11-26 5:04:02 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.100.30:4444/xtrabackup_sst//1' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 0 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:55' --gtid-domain-id 100 --bypass --mysqld-args --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:7,100-101-3' 2022-11-26 5:04:02 1 [Note] WSREP: sst_donor_thread signaled with 0 2022-11-26 5:04:02 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 56-57, preload starts from 57 2022-11-26 5:04:02 0 [Note] WSREP: IST sender 56 -> 57 rsync IST info When Node1 becomes the donor 2022-11-26 5:36:23 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.100.30:4444/rsync_sst' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 1 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:69' --gtid-domain-id 200 --bypass --mysqld-args --wsrep-new-cluster --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:66,200-101-0' 2022-11-26 5:36:23 1 [Note] WSREP: sst_donor_thread signaled with 0 2022-11-26 5:36:23 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 70-71, preload starts from 71 2022-11-26 5:36:23 0 [Note] WSREP: IST sender 70 -> 71 When Node2 becomes the donor. 2022-11-26 5:35:18 1 [Note] WSREP: Server status change synced -> donor 2022-11-26 5:35:18 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-11-26 5:35:18 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.100.30:4444/rsync_sst' --local-port 3306 --socket '/run/mysqld/mysqld.sock' --progress 1 --datadir '/var/lib/mysql/' --gtid 'ee9f290d-6d41-11ed-a940-ffb1d812fa61:63' --gtid-domain-id 100 --bypass --mysqld-args --wsrep_start_position=ee9f290d-6d41-11ed-a940-ffb1d812fa61:64,100-101-3' 2022-11-26 5:35:18 0 [Note] WSREP: Donor monitor thread started to monitor 2022-11-26 5:35:18 1 [Note] WSREP: sst_donor_thread signaled with 0 2022-11-26 5:35:18 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.100.30:4568 sending 64-69, preload starts from 69 2022-11-26 5:35:18 0 [Note] WSREP: IST sender 64 -> 69 This issue causes the GTID sequence to be incorrectly generated on the cluster nodes. If we generate any transaction on node3 then gtid_binlog_pos on node3 and node1 will use new wsrep_domain_id but replicated event on node2 will select old wsrep_gtid_domain_id. Node3 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 200-101-2 | 200-101-2 | 100 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec)   MariaDB [(none)]> Node1 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 200-101-2 | 200-101-2 | 200 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec)   MariaDB [(none)]> Node2 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 100-101-5 | 100-101-5 | 200 | + -------------------+--------------------+------------------------+ 1 row in set (0.001 sec)   MariaDB [(none)]> {code:sql}   Similarly if we create any transaction on node2 then gtid_binlog_pos on node3 and node2 will use old wsrep_domain_id but replicated transaction on node1 will select new wsrep_gtid_domain_id. *Node2* {code:sql} MariaDB [(none)]> create database db1; Query OK, 1 row affected (0.006 sec)   MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 100-101-3 | 100-101-3 | 200 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec)   MariaDB [(none)]> Node3 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 100-101-3 | 100-101-3 | 100 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec)   MariaDB [(none)]> Node 1 MariaDB [(none)]> select @@gtid_binlog_pos,@@gtid_current_pos,@@wsrep_gtid_domain_id; + -------------------+--------------------+------------------------+ | @@gtid_binlog_pos | @@gtid_current_pos | @@wsrep_gtid_domain_id | + -------------------+--------------------+------------------------+ | 200-101-1 | 200-101-1 | 200 | + -------------------+--------------------+------------------------+ 1 row in set (0.000 sec)   MariaDB [(none)]>

            People

              jplindst Jan Lindström (Inactive)
              hholzgra Hartmut Holzgraefe
              Votes:
              4 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.