Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4694

Changes to mysql.gtid_slave_pos table become visible too early

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Not a Bug
    • None
    • 10.0.4
    • None
    • None

    Description

      The summary might be not quite accurate looking from inside, but it describes the effect I observe. Please feel free to modify it if needed.

      I have master=>slave setup with master_use_gtid=current_pos. The slave does not have log-slave-updates.
      A test executes DML flow (1 thread only) on master, simultaneously checking gtid* values on slave every 10 seconds, and crashing and restarting the slave every 30 seconds.
      To check the gtid* values, it uses the following query:

      SELECT GROUP_CONCAT(domain_id,"-",server_id,"-",seq_no separator ","), @@gtid_current_pos, @@gtid_slave_pos, @@gtid_binlog_pos, MAX(seq_no)
      FROM mysql.gtid_slave_pos 
      WHERE (domain_id, sub_id) IN (SELECT domain_id, MAX(sub_id) FROM mysql.gtid_slave_pos GROUP BY domain_id)

      and prints the values like this:

      table mysql.gtid_slave_pos: 0-1-1511
      gtid_current_pos: 0-1-1511
      gtid_slave_pos:   0-1-1511
      gtid_binlog_pos:  

      Normally, I see either 3 equal values, as above, or

      table mysql.gtid_slave_pos: 0-1-3197
      gtid_current_pos: 0-1-3196
      gtid_slave_pos:   0-1-3196
      gtid_binlog_pos:  

      which I attributed to a delay at updating the variables.
      However, every once in a while I see the following picture:

      Slave GTID status:
        table mysql.gtid_slave_pos: 0-1-14040
        gtid_current_pos: 0-1-14039
        gtid_slave_pos:   0-1-14039
        gtid_binlog_pos:  
      Sending SIGKILL to server with pid 42395 in order to force crash recovery
       Trying to restart the server ...
      Slave GTID status after server restart:
        table mysql.gtid_slave_pos: 0-1-14039
        gtid_current_pos: 0-1-14039
        gtid_slave_pos:   0-1-14039
        gtid_binlog_pos:  

      That is, mysql.gtid_slave_pos already had seq_no 14040 before server crash, but after restart it's back to 14039.
      At the same time, I see no indication of the event 14040 being re-executed. So, it looks like no harm is done.
      Transaction isolation level is default REPEATABLE-READ.
      I find it bizarre, because to my understanding the table is supposed to be updated in the same transaction as the change the new GTID belongs to, so if I see the change, the transaction should be already committed; and then I don't understand how it can be rolled back after restart...

      I can provide more information, including how to reproduce it with a stress test, or I can setup the test on perro, or modify the test in any manner, or do whatever it takes to debug it, but I'm also quite prepared to learn that it works as expected.

      bzr version-info

      revision-id: knielsen@knielsen-hq.org-20130621192324-544gs6nw2gj4h45x
      revno: 3664
      branch-nick: 10.0-base

      The effect is not related to two recent bugfixes, I observed it before as well.

      Attachments

        Issue Links

          Activity

            knielsen Kristian Nielsen added a comment - - edited

            This is actually InnoDB behaviour, as per this comment from the source:

            -----------------------------------------------------------------------
            /* The following assignment makes the transaction committed in memory
            and makes its changes to data visible to other transactions.
            NOTE that there is a small discrepancy from the strict formal
            visibility rules here: a human user of the database can see
            modifications made by another transaction T even before the necessary
            log segment has been flushed to the disk. If the database happens to
            crash before the flush, the user has seen modifications from T which
            will never be a committed transaction. However, any transaction T2
            which sees the modifications of the committing transaction T, and
            which also itself makes modifications to the database, will get an lsn
            larger than the committing transaction T. In the case where the log
            flush fails, and T never gets committed, also T2 will never get
            committed. */
            -----------------------------------------------------------------------

            I was able to repeat the same behaviour just with a standard InnoDB table, no
            replication or GTID. Test case as follows, though it uses sleep and is thus
            racy and may not behave the same on your machine:

            -----------------------------------------------------------------------

            1. rpl_mdev4694.test
              --source include/have_innodb.inc
              --source include/have_debug.inc

            --connect(con1,localhost,root,,)

            --connection default
            CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;

            INSERT INTO t1 VALUES (1, 1);
            INSERT INTO t1 VALUES (2, 2);

            --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
            wait
            EOF

            --connection con1
            BEGIN;
            INSERT INTO t1 VALUES (3,3);
            DELETE FROM t1 WHERE a=1;
            send COMMIT;

            --connection default
            SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
            BEGIN;
            SELECT * FROM t1 ORDER BY a;
            SET SESSION debug_dbug="+d,dummy_disable_default_dbug_output";
            SET SESSION debug_dbug="+d,crash_dispatch_command_before";
            --error 2006,2013
            SELECT 1;
            --source include/wait_until_disconnected.inc

            --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
            restart
            EOF

            --enable_reconnect
            --source include/wait_until_connected_again.inc

            SELECT * FROM t1 ORDER BY a;

            --connection con1
            --enable_reconnect
            --error 0,2006,2013
            reap;

            --connection default
            DROP TABLE t1;
            -----------------------------------------------------------------------

            You will need this .opt file:

            -----------------------------------------------------------------------

            1. rpl_mdev4694-master.opt
              --skip-stack-trace --skip-core-file
              -----------------------------------------------------------------------

            And this patch:

            -----------------------------------------------------------------------
            === modified file 'storage/xtradb/trx/trx0trx.c'
            — storage/xtradb/trx/trx0trx.c 2013-06-06 15:51:28 +0000
            +++ storage/xtradb/trx/trx0trx.c 2013-06-22 09:20:43 +0000
            @@ -1200,7 +1200,7 @@ trx_commit_off_kernel(
            ulint flush_log_at_trx_commit;

            mutex_exit(&kernel_mutex);
            -
            + os_thread_sleep(3000000);
            if (trx->insert_undo != NULL) {

            trx_undo_insert_cleanup(trx);

            -----------------------------------------------------------------------

            I get the following output:

            -----------------------------------------------------------------------
            CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (1, 1);
            INSERT INTO t1 VALUES (2, 2);
            BEGIN;
            INSERT INTO t1 VALUES (3,3);
            DELETE FROM t1 WHERE a=1;
            COMMIT;
            SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
            BEGIN;
            SELECT * FROM t1 ORDER BY a;
            a b
            2 2
            3 3
            SET SESSION debug_dbug="+d,dummy_disable_default_dbug_output";
            SET SESSION debug_dbug="+d,crash_dispatch_command_before";
            SELECT 1;
            Got one of the listed errors
            SELECT * FROM t1 ORDER BY a;
            a b
            1 1
            2 2
            DROP TABLE t1;
            main.rpl_mdev4694 'xtradb' [ pass ] 13829
            -----------------------------------------------------------------------

            So the transaction is visible before the crash, but gone after the crash.

            So this seems to be intended InnoDB behaviour. Transactions seen just before a
            crash may still be rolled back during crash recovery, if the crash happens in
            the middle of the commit of said transactions.

            I believe it will not cause any issues for GTID, as you also seem to
            observe. The change to mysql.gtid_slave_pos will be recovered or rolled back
            together with the replicated transaction during InnoDB crash recovery, so the
            result will be consistent and correct irrespectively of visibility before the
            crash.

            So I will mark this as "not a bug". But please re-open if you think there are
            more things related to this to consider or investigate. I am quite happy that
            you have reached this deep level of testing of the GTID stuff. So please
            continue reporting anything that appears strange, even if this is not a bug,
            the next one may well be.

            knielsen Kristian Nielsen added a comment - - edited This is actually InnoDB behaviour, as per this comment from the source: ----------------------------------------------------------------------- /* The following assignment makes the transaction committed in memory and makes its changes to data visible to other transactions. NOTE that there is a small discrepancy from the strict formal visibility rules here: a human user of the database can see modifications made by another transaction T even before the necessary log segment has been flushed to the disk. If the database happens to crash before the flush, the user has seen modifications from T which will never be a committed transaction. However, any transaction T2 which sees the modifications of the committing transaction T, and which also itself makes modifications to the database, will get an lsn larger than the committing transaction T. In the case where the log flush fails, and T never gets committed, also T2 will never get committed. */ ----------------------------------------------------------------------- I was able to repeat the same behaviour just with a standard InnoDB table, no replication or GTID. Test case as follows, though it uses sleep and is thus racy and may not behave the same on your machine: ----------------------------------------------------------------------- rpl_mdev4694.test --source include/have_innodb.inc --source include/have_debug.inc --connect(con1,localhost,root,,) --connection default CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; INSERT INTO t1 VALUES (1, 1); INSERT INTO t1 VALUES (2, 2); --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect wait EOF --connection con1 BEGIN; INSERT INTO t1 VALUES (3,3); DELETE FROM t1 WHERE a=1; send COMMIT; --connection default SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN; SELECT * FROM t1 ORDER BY a; SET SESSION debug_dbug="+d,dummy_disable_default_dbug_output"; SET SESSION debug_dbug="+d,crash_dispatch_command_before"; --error 2006,2013 SELECT 1; --source include/wait_until_disconnected.inc --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect restart EOF --enable_reconnect --source include/wait_until_connected_again.inc SELECT * FROM t1 ORDER BY a; --connection con1 --enable_reconnect --error 0,2006,2013 reap; --connection default DROP TABLE t1; ----------------------------------------------------------------------- You will need this .opt file: ----------------------------------------------------------------------- rpl_mdev4694-master.opt --skip-stack-trace --skip-core-file ----------------------------------------------------------------------- And this patch: ----------------------------------------------------------------------- === modified file 'storage/xtradb/trx/trx0trx.c' — storage/xtradb/trx/trx0trx.c 2013-06-06 15:51:28 +0000 +++ storage/xtradb/trx/trx0trx.c 2013-06-22 09:20:43 +0000 @@ -1200,7 +1200,7 @@ trx_commit_off_kernel( ulint flush_log_at_trx_commit; mutex_exit(&kernel_mutex); - + os_thread_sleep(3000000); if (trx->insert_undo != NULL) { trx_undo_insert_cleanup(trx); ----------------------------------------------------------------------- I get the following output: ----------------------------------------------------------------------- CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; INSERT INTO t1 VALUES (1, 1); INSERT INTO t1 VALUES (2, 2); BEGIN; INSERT INTO t1 VALUES (3,3); DELETE FROM t1 WHERE a=1; COMMIT; SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN; SELECT * FROM t1 ORDER BY a; a b 2 2 3 3 SET SESSION debug_dbug="+d,dummy_disable_default_dbug_output"; SET SESSION debug_dbug="+d,crash_dispatch_command_before"; SELECT 1; Got one of the listed errors SELECT * FROM t1 ORDER BY a; a b 1 1 2 2 DROP TABLE t1; main.rpl_mdev4694 'xtradb' [ pass ] 13829 ----------------------------------------------------------------------- So the transaction is visible before the crash, but gone after the crash. So this seems to be intended InnoDB behaviour. Transactions seen just before a crash may still be rolled back during crash recovery, if the crash happens in the middle of the commit of said transactions. I believe it will not cause any issues for GTID, as you also seem to observe. The change to mysql.gtid_slave_pos will be recovered or rolled back together with the replicated transaction during InnoDB crash recovery, so the result will be consistent and correct irrespectively of visibility before the crash. So I will mark this as "not a bug". But please re-open if you think there are more things related to this to consider or investigate. I am quite happy that you have reached this deep level of testing of the GTID stuff. So please continue reporting anything that appears strange, even if this is not a bug, the next one may well be.

            Thank you, I suspected there was a catch in there somewhere, but it looked so weird that I just had to find out. I agree that it's not a bug, at least as long as we don't find a scenario where it actually causes any trouble.

            elenst Elena Stepanova added a comment - Thank you, I suspected there was a catch in there somewhere, but it looked so weird that I just had to find out. I agree that it's not a bug, at least as long as we don't find a scenario where it actually causes any trouble.

            People

              knielsen Kristian Nielsen
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.