[MDEV-14436] Incorrect binlog order on slave for MyISAM statements Created: 2017-11-17  Updated: 2020-12-01

Status: Confirmed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0, 10.1, 10.0.33, 10.2
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Kristian Nielsen Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: replication


 Description   

This bug is a regression introduced with this commit:

commit 8bf6152ba08d02b7d5cf14495dc37fc7b8845957 (HEAD)
Author: Michael Widenius <monty@askmonty.org>
Date:   Wed Feb 5 19:01:59 2014 +0200
 
    Replication changes for CREATE OR REPLACE TABLE

The problem with this patch is that makes the slave SQL thread always use
the binlog cache, also for non-transactional tables. Consider a myisam
update that is binlogged like this:

  BEGIN
  UPDATE myisam_table SET b=2 WHERE a=1
  COMMIT

In order to ensure correct binlog order, it is essential that the binlog is
written before table locks are relased. However, after the above commit, the
SQL thread will release table locks after the UPDATE, but not write to the
slave's binlog until COMMIT.

A simple testcase shows the problem. First, apply this patch to recent 10.0:

index b66ceac72bf..457e93dc1c1 100644
--- a/sql/log_event.cc
+++ b/sql/log_event.cc
@@ -4302,6 +4302,7 @@ int Query_log_event::do_apply_event(rpl_group_info *rgi,
       */
       if (current_stmt_is_commit)
       {
+my_sleep(4000000);
         thd->variables.option_bits&= ~OPTION_GTID_BEGIN;
         if (rgi->gtid_pending)
         {

Then run this testcase:

--let $rpl_topology=1->2->3
--source include/rpl_init.inc
 
--connection server_1
CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=MyISAM;
INSERT INTO t1 VALUES (1, 1);
--save_master_pos
 
--connection server_2
--sync_with_master
--save_master_pos
 
--connection server_3
--sync_with_master
SELECT * FROM t1 ORDER BY a;
 
--connection server_1
UPDATE t1 SET b=2 WHERE a=1;
--save_master_pos
 
--connection server_2
SELECT SLEEP(2);
UPDATE t1 SET b=3 WHERE a=1;
SELECT * FROM t1 ORDER BY a;
--sync_with_master
--save_master_pos
SELECT * FROM t1 ORDER BY a;
 
--connection server_3
--sync_with_master
SELECT * FROM t1 ORDER BY a;
 
--connection server_1
DROP TABLE t1;
 
--source include/rpl_end.inc

Before the patch 8bf6152b, the result is correct on all three servers. After
the patch (and in current 10.0), the last slave server_3 gets the wrong
value 2 for b in the table because of incorrect binlog order on server_2:

include/rpl_init.inc [topology=1->2->3]
CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=MyISAM;
INSERT INTO t1 VALUES (1, 1);
SELECT * FROM t1 ORDER BY a;
a	b
1	1
UPDATE t1 SET b=2 WHERE a=1;
SELECT SLEEP(2);
SLEEP(2)
0
UPDATE t1 SET b=3 WHERE a=1;
SELECT * FROM t1 ORDER BY a;
a	b
1	3
SELECT * FROM t1 ORDER BY a;
a	b
1	3
SELECT * FROM t1 ORDER BY a;
a	b
1	2
DROP TABLE t1;
include/rpl_end.inc

This problem can also occur without direct changes on the slave if parallel
replication is enabled on server_2.



 Comments   
Comment by Kristian Nielsen [ 2017-11-17 ]

BTW, this .cnf is needed to run the testcase:

!include ../my.cnf
 
[mysqld.1]
log-slave-updates
loose-innodb
 
[mysqld.2]
log-slave-updates
loose-innodb
 
[mysqld.3]
log-slave-updates
loose-innodb
 
[ENV]
SERVER_MYPORT_3=		@mysqld.3.port
SERVER_MYSOCK_3=		@mysqld.3.socket

Comment by Michael Widenius [ 2017-11-22 ]

Kristian, any suggestion for how to fix this ?

The problem my patch fixes, as far as I can remember, was to ensure that GTID's are kept in sync between master and slave.
If we would do an implicit commit for the MyISAM table on the slave, it would generate a new GTID, and the slave would be out of sync when it comes to GTID's.

One solution for this would be to keep the MyISAM tables locked until commit, which is not that trivial as we have to add code to kill any queries using any conflicting tables used on the slave.

Do you have any other suggestions ?

Comment by Kristian Nielsen [ 2017-11-23 ]

Copying my reply on IRC:

09:58 < knielsen> montywi: I seem to recall that the real issue that your patch attempted to address was CREATE 
                  TABLE ... SELECT. Because it was binloggeed as separate CREATE TABLE and SELECT statements?
09:59 < knielsen> I mean separate CREATE TABLE and INSERT ... SELECT
09:59 < knielsen> montywi: If so, postponing the table locks could be done _only_ for the CREATE TABLE. Other MyISAM 
                  DML can still binlog directly and not violate table logging
10:01 < knielsen> montywi: If then people do multi-statement innodb transactions on a master and try to replicate 
                  them onto equivalent myisam tables on the slave, then yes there will be new GTIDs created on the 
                  slave, but that should be ok (GTID can handle out-of-order sequence numbers in binlog). And this 
                  is not an important use-case to support (as compared with violating locking for all MyISAM slave 
                  DML)
10:06 < knielsen> montywi: (Note that it was Ninpo on #maria who experienced and reported this bug, I just made the 
                  jira entry when I realised what the root cause is)

Generated at Thu Feb 08 08:13:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.