Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26682

slave lock timeout with xa and gap locks

Details

    Description

      there are, perhaps, may ways that gap locks can be places differently on the master and on the slave. combined with 10.5+ XA binlogging (MDEV-742) this can cause locks to timeout on slaves. breaking replication. For example:

      # this test case can be run with read-committed (or higher) isolation leven
      source include/have_innodb.inc;
      source include/have_binlog_format_row.inc;
      source include/master-slave.inc;
      create table t1 (a int primary key, b int unique) engine=innodb;
      insert t1 values (1,1),(3,3),(5,5);
      sync_slave_with_master;
       
      # set a strong isolation level to keep the read view below.
      # alternatively a long-running select can do that too even in read-committed
      set session tx_isolation='repeatable-read';
      start transaction;
      # opens a read view to disable purge on the slave
      select * from t1;
       
      connect m2, localhost, root;
      # now, delete a value, purge it on the master, but not on the slave
      delete from t1 where a=3;
      xa start 'x1';
      # this sets a gap lock on <3>, when it exists (so, on the slave)
      update t1 set b=3 where a=5;
      xa end 'x1';
      xa prepare 'x1';
       
      connect m3, localhost, root;
      # and this tries to insert straight into the locked gap
      insert t1 values (2, 2);
       
      echo -->slave;
      sync_slave_with_master;
      commit;
      select * from t1;
       
      connection m2;
      xa rollback 'x1';
      drop table t1;
      source include/rpl_end.inc;
      

      A possible way to fix all lock timeouts on the slave caused by gap locks and XA is to release gap locks on XA prepare.

      Attachments

        Issue Links

          Activity

            Another example of gap locks and XA. This time no purge, no selects on the slave side. This case exploits lock asymmetry — gap lock prevents insert intention lock, but an insert intention lock doesn't prevent a gap lock. We need to execute statements on the master in one order, but have them binlogged in the opposite order:

            source include/have_innodb.inc;
            source include/have_binlog_format_row.inc;
            source include/master-slave.inc;
             
            create table t1 (id int not null auto_increment primary key, c1 int not null, unique key(c1)) engine=innodb;
            create table t2 (id int not null auto_increment primary key, c1 int not null, foreign key(c1) references t1(c1), unique key(c1)) engine=innodb;
            insert t1 values (869,1), (871,3), (873,4), (872,5), (870,6), (877,7);
            insert t2 values (795,6), (800,7);
             
            xa start '1';
            update `t2` set `id` = 9, `c1` = 5 where `c1` in ( null, null, null, null, null, 7, 3 );
             
            connect con1, localhost,root;
            xa start '2';
            delete from `t1` where `c1` like concat( 3, '%' );
            xa end '2';
            xa prepare '2';
             
            connection master;
            xa end '1';
            xa prepare '1';
             
            echo ->slave;
            sync_slave_with_master;
            

            serg Sergei Golubchik added a comment - Another example of gap locks and XA. This time no purge, no selects on the slave side. This case exploits lock asymmetry — gap lock prevents insert intention lock, but an insert intention lock doesn't prevent a gap lock. We need to execute statements on the master in one order, but have them binlogged in the opposite order: source include/have_innodb.inc; source include/have_binlog_format_row.inc; source include/master-slave.inc;   create table t1 (id int not null auto_increment primary key , c1 int not null , unique key (c1)) engine=innodb; create table t2 (id int not null auto_increment primary key , c1 int not null , foreign key (c1) references t1(c1), unique key (c1)) engine=innodb; insert t1 values (869,1), (871,3), (873,4), (872,5), (870,6), (877,7); insert t2 values (795,6), (800,7);   xa start '1' ; update `t2` set `id` = 9, `c1` = 5 where `c1` in ( null , null , null , null , null , 7, 3 );   connect con1, localhost,root; xa start '2' ; delete from `t1` where `c1` like concat( 3, '%' ); xa end '2' ; xa prepare '2' ;   connection master; xa end '1' ; xa prepare '1' ;   echo ->slave; sync_slave_with_master;
            marko Marko Mäkelä added a comment - A possible fix would be to release all non-exclusive locks on XA PREPARE .

            c1a9b1c2f21e6796b687 is almost ok.

            1. please add a comment near thd_sql_command() explaining why it was needed
            2. add tests, e.g. as above.

            after that — ok to push, thanks!

            serg Sergei Golubchik added a comment - c1a9b1c2f21e6796b687 is almost ok. please add a comment near thd_sql_command() explaining why it was needed add tests, e.g. as above. after that — ok to push, thanks!

            People

              marko Marko Mäkelä
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.