[MDEV-23324] After (re)starting multiple slaves, parallel updates seem to compromise ACID Created: 2020-07-29  Updated: 2020-08-05

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.4.13
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: sjon Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

binlog_format = ROW
gtid_strict_mode = 1
multi-master setup
slaves:

  • Parallel_Mode: conservative (default)
  • Using_Gtid: Current_Pos


 Description   

I have a table with a counter that is updated relatively by multiple masters. One of the slaves (that has 2 masters) stopped replicating after an unrelated failure (one of the masters deleted a row that the other master inserted previously, apparently there was some lag).

After restarting the replication from both master (using start all slaves), I ended up with a mismatch of data between the two masters, and the slave. Both masters updated the same counter 6 times, but the slave ended up with a final value of 11 instead of 12:

redacted binlog from master1, with irrelevant fields from Account removed and redacted ID:

#200722  8:28:32 server id 1  end_log_pos 12219708 CRC32 0xf87669df     Annotate_rows:
#Q> INSERT INTO `Account` (`id`, `testHits`) VALUES (9, 0)
#200722  8:50:18 server id 1  end_log_pos 12992418 CRC32 0x239cb8e2     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
 
#200727  8:26:22 server id 1  end_log_pos 12421103 CRC32 0x712982e4     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
#200727  8:26:36 server id 1  end_log_pos 12431646 CRC32 0x3e2f3505     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
#200727  8:26:43 server id 1  end_log_pos 12436404 CRC32 0x8ab1931a     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
 
#200728  7:02:02 server id 1  end_log_pos 340028656 CRC32 0xcf9797a5     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
#200728  7:02:19 server id 1  end_log_pos 340032448 CRC32 0x71109d4b     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9

binlog from master1:

#200722  8:31:20 server id 3  end_log_pos 4420014 CRC32 0xb6bc7350     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
#200722  9:49:32 server id 3  end_log_pos 5237738 CRC32 0x9627872a     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
 
#200727 14:04:32 server id 3  end_log_pos 8793329 CRC32 0x2651f5ea     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
#200727 14:04:45 server id 3  end_log_pos 8794212 CRC32 0x619663f3     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
#200727 14:12:01 server id 3  end_log_pos 8821614 CRC32 0xef4ded02     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9
 
#200728  7:02:11 server id 3  end_log_pos 3108883 CRC32 0x6c2bf4fd     Annotate_rows:
#Q> UPDATE `Account` SET `testHits` = `testHits` + 1 WHERE `id` = 9

Replication was halted at 200728 04:03:11. Both masters had testHits=12, but this particular slave ended up with testHits=11

Is this expected behaviour?


Generated at Thu Feb 08 09:21:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.