[MDEV-6260] Isolated galera node is out of sync Created: 2014-05-22  Updated: 2014-05-22  Resolved: 2014-05-22

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.37-galera, 10.0.10-galera
Fix Version/s: 5.5.38-galera, 10.0.11-galera

Type: Bug Priority: Critical
Reporter: Jan Lindström (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Not a Bug Votes: 0
Labels: galera


 Description   

I used 3 node galera cluster with empty databases. I started all three nodes and created following tables:

create database test_db;
CREATE DATABASE IF NOT EXISTS mysqlslap;
CREATE TABLE  test_db.Test_1 (
    id INT AUTO_INCREMENT PRIMARY KEY,
    customer_surname VARCHAR(30),
    store_id INT,
    salesperson_id INT,
    order_date DATE,
    note VARCHAR(500)
    ) engine=innodb;
 
CREATE TABLE  test_db.Test_2 (
    id INT AUTO_INCREMENT PRIMARY KEY,
    customer_surname VARCHAR(30),
    store_id INT,
    salesperson_id INT,
    order_date DATE,
    note VARCHAR(500)
    ) engine=innodb;

After this, I loaded some data with:

time /usr/local/mysql/bin/mysqlslap           \
    --host=127.0.0.1  \
    --user=root \
    -p    \
    --engine=InnoDB \
    --protocol=TCP  \
    --port=4000     \
    --query="INSERT INTO test_db.Test_1 (customer_surname) VALUES ('Galera2-row 
x');" \
    --concurrency=10  \
    --iterations=200 \

Now I connected to the node1 and isolated it from galera cluster with:

mysql --user=root -S mysql.5000.sock -e "SET GLOBAL wsrep_provider=none"

I loaded additional rows to another table (note that there is no other load to nodes):

time /usr/local/mysql/bin/mysqlslap           \
    --host=127.0.0.1  \
    --user=root \
    -p    \
    --engine=InnoDB \
    --protocol=TCP  \
    --port=5000     \
    --query="INSERT INTO test_db.Test_2 (customer_surname) VALUES ('Galera2-row 
x');" \
    --concurrency=10  \
    --iterations=200 \
    --verbose

And, finally put the isolated node back to galera cluster :

MariaDB [test_db]> set global wsrep_provider='/usr/lib/libgalera_smm.so';
Query OK, 0 rows affected (2.05 sec)
 
MariaDB [test_db]> select * from Test_2;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use
MariaDB [test_db]> SET GLOBAL wsrep_provider_options='pc.bootstrap=yes';
ERROR 1210 (HY000): Incorrect arguments to SET
 

Two other nodes do not see rows on table Test_2 (table Test_1 is fine on all nodes) and node2 does not seem to be ever in prepared mode. Last part of error log on isolated node:

140522  8:09:06 [Note] WSREP: Stop replication
140522  8:09:08 [Note] WSREP: Initial position: 8a69b463-e16e-11e3-a066-f6b23632b4f5:2004
140522  8:09:08 [Warning] WSREP: Initial position was provided by configuration or SST, avoiding override
140522  8:09:08 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
140522  8:09:08 [Note] WSREP: wsrep_load(): Galera 3.2(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
140522  8:09:08 [Note] WSREP: CRC-32C: using hardware acceleration.
140522  8:09:08 [Note] WSREP: Found saved state: 8a69b463-e16e-11e3-a066-f6b23632b4f5:2004
140522  8:09:08 [Note] WSREP: Passing config to GCS: base_host = 178.251.56.245; base_port = 4567; cert.log_conflicts = no; gcache.dir = /home/jan/mysql/galera-cluster-5.5/node1/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /home/jan/mysql/galera-cluster-5.5/node1//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
140522  8:09:08 [Note] WSREP: Assign initial position for certification: 2004, protocol version: -1
140522  8:24:46 [Warning] WSREP: Unknown parameter 'pc.bootstrap'
140522  8:24:46 [ERROR] WSREP: Set options returned 7



 Comments   
Comment by Jan Lindström (Inactive) [ 2014-05-22 ]

It seems that problem is that wsrep_cluster_address is empty. If I set that:

MariaDB [test_db]> set global wsrep_cluster_address='gcomm://127.0.0.1:4030';
Query OK, 0 rows affected (3.02 sec)

Now, the isolated node joins back to cluster, but databases are not sync, i.e. data on Test_2 is not replicated. Problem is that galera does not know that data has been changed.

Comment by Jan Lindström (Inactive) [ 2014-05-22 ]

Thus on isolated node

1. unload provider
2. remove grastate.dat file
3. load provider
4. set global wsrep_cluster_address='gcomm://'

This creates a new cluster based on this most up to date node. Then you join other nodes and they will take SST. You may use bootstrap when nodes are joined to cluster.

MariaDB [test_db]> SET GLOBAL wsrep_provider_options="pc.bootstrap=yes";
Query OK, 0 rows affected (0.00 sec)

Comment by Jan Lindström (Inactive) [ 2014-05-22 ]

Works as designed. You should not isolate the node while you load data or if you isolate you must create a new cluster based on this most up to date node.

Generated at Thu Feb 08 07:10:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.