Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-445

Research MySQL 5.6 Global transaction IDs

    XMLWordPrintable

Details

    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      MySQL global transaction ID.

      The feature is optional, needs to be enabled.

      When enabled, every event group (aproximately == transaction) gets as the
      first event a GTID_LOG_EVENT. This has a 128-bit server uuid and a 64-bit
      sequence number, which together constitute a globally unique Gtid.

      Per documentation, the usage is fairly simple from the user's point of
      view. Slave remembers all Gtid's that it has applied, and sends this
      information to master (it's compressed so it shouldn't grow unboundedly),
      master automatically sends only new events to slave. Slave detects and skips
      duplicate event (same Gtid). CHANGE MASTER TO ... MASTER_AUTO_POSITION=1

      It is not clear to me if a new Gtid slave can be provisioned without stopping
      the master for long periods - I did not find a way to specify a starting point
      for CHANGE MASTER TO when using Gtid. Maybe one can provision new slaves by
      stopping and copying an existing one.

      When using Gtid, no changes to non-transactional tables are allowed! So
      basically all replicated tables must be InnoDB (there are other less severe
      restrictions also).

      On a Gtid-configured slave, --log-slave-updates is mandatory. I think the
      slave binlog is used to recover the set of Gtids already applied on the slave.

      The code appears surprisingly unfinished, which makes understanding it
      harder. It looks like parts of the initial design was dropped (or postponed?)
      at a late state during implementation, and the extra code is still
      there. Eg. I found many occurances of stuff like this, in
      write_one_empty_group_to_cache():

      /*
      Apparently this code is not being called. We need to
      investigate if this is a bug or this code is not
      necessary. /Alfranio
      */
      DBUG_ASSERT(0); /NOTREACHED/

      See below for some more random examples.

      The code maintains a lot of in-memory data structures containing Gtid and
      other stuff. I'm still working on understanding what this is used for (and how
      much of it is used).

      It seems to me that the design is heabily influenced by the requirements of
      the MySQL parallel slave feature.

      There are a lot of new concepts refered to all over the code - it will be
      necessary to first understand what they mean before the code can be grasped.
      So I'm making a list of them.

      Concepts:

      Gtid: a pair of (SID, GNO) - globally unique identifier of a binlog group
      (ie. transaction).

      SID: Server UUID - 128 bit.

      GNO: Group number - 64 bit. This is monotonically increasing. I think it is a
      requirement that there be no holes in the sequence - slave will remember
      forever any GNO not yet received waiting for it to arrive.

      Gtid set: A set of Gtids, used in several places, eg. for the set of all Gtids
      applied on a slave server. Such sets are represented as a list of
      intervals (or something like that) - since normally a slave will have
      applied all Gtids from start to current position (per server id), except
      for perhaps a few ones still pending, this should be an efficient
      representation.

      automatic group: I think this is an event group that is logged to the binlog
      with a Gtid with the next GNO in sequence (as opposed to a group logged by
      a slave, which preserves the Gtid from the master).

      anonymous group: This is used to refer to a group with has no Gtid, so it is
      logged just with binlog file name and position.

      SIDNO: This seems to be a 32-bit integer used internally to abstract a SID?

      sid_map: This seems to be a map maintained to translate between SID and
      sid_map. I wonder if the idea was to save memory (32 bit instead of 128)?
      There is some indication that this is not used in current code, or only
      partially used.

      gtid_next: Session variable. AUTOMATIC means allocate the next GNO in
      sequence, ANONYMOUS means not use Gtid, else it is the Gtid from the
      master set by SQL thread (or mysqldump) to preserve Gtid.

      PREVIOUS_GTIDS_LOG_EVENT: Seems like this is logged at the start of every
      binlog file. My guess is that it provides the set of all Gtid's that exist
      in binlog files prior to this one. This can be used when a slave connects
      to know if we can start from this binlog, or need to go back to a previous
      binlog, to find all events that the slave needs.

      lost group: Seems to be the set of Gtid's in binlogs that were purged.

      done group: Set of all gtids in the binary log. There is also a session-scoped
      system variable @@gtid_done, which is the set of all groups/Gtids in the
      current transaction (not sure how there could be multiple groups in one
      transaction though - seems this is some NDB stuff that is
      disabled/incomplete).

      ANONYMOUS_GTID_LOG_EVENT: The Gtid_log_event constructor creates this if
      gtid_next is ANONYMOUS - but gtid_next cannot be set to ANONYMOUS if Gtid
      is enabled. So not sure if/how this event can be created.

      gtid_next_list: Appears to be disabled and not functional code. It is under
      #ifdef HAVE_NDB_BINLOG and has a non-conditional
      my_error(ER_NOT_SUPPORTED_YET...). There is this description: "Before
      re-executing a transaction that contains multiple Global Transaction
      Identifiers, this variable must be set to the set of all re-executed
      transactions."

      Slave connect: When a slave connects with MASTER_AUTO_POSITION=1, it computes
      the set of all Gtid's it has already seen - looks like this is the union
      of what exists in the slave binary log, and what exists in the slave relay
      logs. It sends this to the master. The master then checks each Gtid if the
      slave has received it before, and sends only events that it has not
      seen. It looks as if a slave also keeps track of binlog positions so it
      can start where it left on reconnect, but will start from scratch on new
      CHANGE MASTER, scanning/skipping all binlogs on master until it reaches
      the first Gtid not yet seen...

      Still to be investigated/understood:

      group log: ?

      logged group: ?

      owned group: ?

      super group: ?

      HB: ?

      group cache: Seems to be a cache of Gtid and the associated binlog
      position. Not sure what this is used for yet...

      statement group cache: ?

      transaction group cache: ?

      Slave connect: How does the slave determine where to (re)start replication,
      how does it send it to the master, and how does the master determine the
      binlog from where to send? And does the master or the slave filter
      already-applied events?

      Slave-state: How does slave remember (durably) what point in the replication
      stream it has reached, so that it can continue on START SLAVE? Is such
      storage crash-safe? I think perhaps the slave binlog is used for this (and
      --log-slave-updates is mandatory for Gtid slaves). Note that such state is
      a set, not a single Gtid - I believe it's basically the list of holes in
      the sequence, intervals of GNOs not yet received.

      Error handling: What happens if binlog write fails? Seems we cannot easily
      roll back at that point, as then an already allocated GNO would leave a
      hole in the stream?

      -----------------------------------------------------------------------

      References:

      Blog post by Luís Soares announcing the push to trunk:

      http://d2-systems.blogspot.co.uk/2012/04/global-transaction-identifiers-are-in.html

      MySQL 5.6 manual:

      https://dev.mysql.com/doc/refman/5.6/en/replication-gtids.html

      Worklog description (but it mostly says the description is in an attached PDF,
      which is apparently missing from the forge):

      http://forge.mysql.com/worklog/task.php?id=3584

      -----------------------------------------------------------------------

      The feature appears incomplete, even though they pushed it to trunk.
      Examples:

      sys_vars.cc:

      #ifdef HAVE_NDB_BINLOG
      static bool check_gtid_next_list(sys_var *self, THD *thd, set_var *var)
      {
      DBUG_ENTER("check_gtid_next_list");
      my_error(ER_NOT_SUPPORTED_YET, MYF(0), "GTID_NEXT_LIST");
      ...

      (and it returns ok despite the my_error, which will probably cause assert...)

      /*
      This code is not being used but we will keep it as it may be
      useful if we decide to keeep disable_gtid_unsafe_statements.
      */
      #ifdef NON_DISABLED_GTID
      static bool check_disable_gtid_unsafe_statements(
      ...

      /*
      This code is not being used but we will keep it as it may be
      useful when we improve the code around Sys_gtid_mode.
      */
      #ifdef NON_DISABLED_GTID
      static bool check_gtid_mode(sys_var *self, THD *thd, set_var *var)
      {
      my_error(ER_NOT_SUPPORTED_YET, MYF(0), "GTID_MODE");
      ...

      Apparently there is some disabled code to facilitate upgrade? That is at least
      hinted by this error message and commented-out description:

      "The value of GTID_MODE can only change one step at a time: OFF <->
      UPGRADE_STEP_1 <> UPGRADE_STEP_2 <> ON. Also note that this value must
      be stepped up or down simultaneously on all servers; see the Manual for
      instructions."

      binlog.cc:

      /*
      Apparently this code is not being called. We need to
      investigate if this is a bug or this code is not
      necessary. /Alfranio
      */
      DBUG_ASSERT(0); /NOTREACHED/
      #ifdef NON_ERROR_GTID
      IO_CACHE *cache= &cache_data->cache_log;
      ...

      rpl_master.cc:

      /*
      Before going GA, we need to make this protocol extensible without
      breaking compatitibilty. /Alfranio.
      */

      That's like saying that the protocol will for sure change before 5.6 GA ... :-/

      There are many more examples, too numeours to list exhaustively ...

      Attachments

        Issue Links

          Activity

            People

              knielsen Kristian Nielsen
              knielsen Kristian Nielsen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.