Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4694

Changes to mysql.gtid_slave_pos table become visible too early



    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Not a Bug
    • None
    • 10.0.4
    • None
    • None


      The summary might be not quite accurate looking from inside, but it describes the effect I observe. Please feel free to modify it if needed.

      I have master=>slave setup with master_use_gtid=current_pos. The slave does not have log-slave-updates.
      A test executes DML flow (1 thread only) on master, simultaneously checking gtid* values on slave every 10 seconds, and crashing and restarting the slave every 30 seconds.
      To check the gtid* values, it uses the following query:

      SELECT GROUP_CONCAT(domain_id,"-",server_id,"-",seq_no separator ","), @@gtid_current_pos, @@gtid_slave_pos, @@gtid_binlog_pos, MAX(seq_no)
      FROM mysql.gtid_slave_pos 
      WHERE (domain_id, sub_id) IN (SELECT domain_id, MAX(sub_id) FROM mysql.gtid_slave_pos GROUP BY domain_id)

      and prints the values like this:

      table mysql.gtid_slave_pos: 0-1-1511
      gtid_current_pos: 0-1-1511
      gtid_slave_pos:   0-1-1511

      Normally, I see either 3 equal values, as above, or

      table mysql.gtid_slave_pos: 0-1-3197
      gtid_current_pos: 0-1-3196
      gtid_slave_pos:   0-1-3196

      which I attributed to a delay at updating the variables.
      However, every once in a while I see the following picture:

      Slave GTID status:
        table mysql.gtid_slave_pos: 0-1-14040
        gtid_current_pos: 0-1-14039
        gtid_slave_pos:   0-1-14039
      Sending SIGKILL to server with pid 42395 in order to force crash recovery
       Trying to restart the server ...
      Slave GTID status after server restart:
        table mysql.gtid_slave_pos: 0-1-14039
        gtid_current_pos: 0-1-14039
        gtid_slave_pos:   0-1-14039

      That is, mysql.gtid_slave_pos already had seq_no 14040 before server crash, but after restart it's back to 14039.
      At the same time, I see no indication of the event 14040 being re-executed. So, it looks like no harm is done.
      Transaction isolation level is default REPEATABLE-READ.
      I find it bizarre, because to my understanding the table is supposed to be updated in the same transaction as the change the new GTID belongs to, so if I see the change, the transaction should be already committed; and then I don't understand how it can be rolled back after restart...

      I can provide more information, including how to reproduce it with a stress test, or I can setup the test on perro, or modify the test in any manner, or do whatever it takes to debug it, but I'm also quite prepared to learn that it works as expected.

      bzr version-info

      revision-id: knielsen@knielsen-hq.org-20130621192324-544gs6nw2gj4h45x
      revno: 3664
      branch-nick: 10.0-base

      The effect is not related to two recent bugfixes, I observed it before as well.


        Issue Links



              knielsen Kristian Nielsen
              elenst Elena Stepanova
              0 Vote for this issue
              1 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.