Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Not a Bug
-
10.2.10
-
None
-
3 node MaraiDB Galera cluster, running on Ubuntu 16.04 VMs.
Description
We've noticed something really troubling with our cluster. Deletes/Inserts/Updates don't seem to be happening synchronously on Galera nodes. To test this scenario I wrote a simple bash script that inserts records into a table on one node and immediately reads row count from a different node. Depending on the latency between servers I get wrong number of rows from the second node in about as high as 39% of the attempts. If I understand the nature of Galera cluster, this should never ever happen, for when I update one node, it shouldn't confirm the update until all nodes have received the data. This is what I've done:
1. I created a database called 'Testing' with a single table:
CREATE TABLE `Table1` (
`Id` INT(11) NOT NULL AUTO_INCREMENT,
`Text` VARCHAR(50) NOT NULL,
PRIMARY KEY (`Id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
2. I wrote a bash script that does the following:
sql='DELETE FROM Table1;';
mysql -h$node1 -u$user -p$pass -sN -D 'Testing' -e "$sql";
sql="INSERT INTO Table1(Text) VALUES('Text1'), ('Text2'), ('Text3'), ('Text4');";
mysql -h$node1 -u$user -p$pass -sN -D 'Testing' -e "$sql";
sql='SELECT COUNT FROM Table1;';
result=$(mysql -h$node2 -u$user -p$pass -sN -D 'Testing' -e "$sql");
As you can see DELETE and INSERT are sent to node1, while SELECT is issued on the node2.
3. Some percentage of time, depending on the nodes selected and/or timing/network lag, SELECT returns 0 rows (between 0.1% and 40% of the time).
4. If I add 1 second delay between INSERT and SELECT, I always get the correct number of rows.
5. There are no errors on MariaDB nodes that I can see and replication seems to be working as far as I can tell.
I'm attaching the test script I wrote. You have to modify it to set your username and password and call it with two parameters for hostnames of the nodes like so:
./test_galera.sh mynode1.domain.com mynode2.domain.com
Please help, unless I'm missing something obvious, this is a critical issue.
Let me know if I can provide any additional info to solve this situation.
Thank you.
Attachments
Issue Links
- relates to
-
MDEV-14480 Improve wsrep_sync_wait documentation
- Closed