[MDEV-11926] Data loss on 10.1.20-MariaDB Created: 2017-01-27  Updated: 2018-10-16  Resolved: 2018-10-16

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.20
Fix Version/s: 10.0.31

Type: Bug Priority: Critical
Reporter: Igor Drobot Assignee: Andrei Elkin
Resolution: Fixed Votes: 0
Labels: Data, integrity, loss, replication
Environment:

openSUSE 42.2
Linux 4.4.36-8-default #1 SMP Fri Dec 9 16:18:38 UTC 2016 (3ec5648) x86_64 x86_64 x86_64 GNU/Linux

MariaDB
Version : 10.1.20-2.1
Arch : x86_64
Vendor : obs://build.opensuse.org/server:database


Attachments: Text File 4x_slaves_status.txt     File my.cnf     Text File slave_status.log    

 Description   

Since we changed from MySQL to MariDB 10.1.20 we expecting data integrity problems. Some deeper investigations aproved that fact and we would have a resolution for this Problem:

In our test case we have two nodes connected over lan and have a typical replication between them:

SystemA <---> SystemB

From SystemC we have started mysqlslap on SystemA to generate some queries:

CREATE DATABASE perftest;
CREATE TABLE perftest.a (ID int NOT NULL AUTO_INCREMENT, b TEXT, PRIMARY KEY (ID)) ENGINE=MyISAM;

mysqlslap -h SystemA -p --delimiter=";" --query="INSERT INTO perftest.a (b) VALUES(NOW())" --concurrency=500 --iterations=200

In the meantime of the generation we stopped on SystemB the slave and started after some seconds again

stop slave;
start slave;

We are waiting a while until SystemA has a count of 100000 inserts = mysqlslap has finished the generation:

select count(*) from perftest.a;

Repeat the above count on SystemB(Slave) where we have stopped and started the slave, you will see a huge difference. In other words, we have a data loss!



 Comments   
Comment by Elena Stepanova [ 2017-01-30 ]

It seems to be the same problem as described in MDEV-11201. I'll assign to plinux to work on them together, so that he could confirm it's indeed the same issue.

drobot, meanwhile, would it be possible for you to try runnning the test with GTID in your replication setup (CHANGE MASTER TO ... MASTER_USE_GTID=current_pos) and see if the failure goes away? If it does, you might consider using it as a workaround.

Comment by Igor Drobot [ 2017-01-30 ]

Using_Gtid is now on Current_Pos. The reproduction of the data loss was since the change not more possible.

Also we have made a test-run in a multi source setup with four nodes, which was finished successfully.

4x_slaves_status.txt

Comment by Andrei Elkin [ 2018-10-16 ]

Show-Slave-Status' Using_Gtid: No and
my.cnf's gtid-ignore-duplicates=ON must indicate the case duplicates MDEV-11201, fixed.

Generated at Thu Feb 08 07:53:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.