[MDEV-445] Research MySQL 5.6 Global transaction IDs - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: None
Component/s: None
Labels:
None

Description

MySQL global transaction ID.

The feature is optional, needs to be enabled.

When enabled, every event group (aproximately == transaction) gets as the
first event a GTID_LOG_EVENT. This has a 128-bit server uuid and a 64-bit
sequence number, which together constitute a globally unique Gtid.

Per documentation, the usage is fairly simple from the user's point of
view. Slave remembers all Gtid's that it has applied, and sends this
information to master (it's compressed so it shouldn't grow unboundedly),
master automatically sends only new events to slave. Slave detects and skips
duplicate event (same Gtid). CHANGE MASTER TO ... MASTER_AUTO_POSITION=1

It is not clear to me if a new Gtid slave can be provisioned without stopping
the master for long periods - I did not find a way to specify a starting point
for CHANGE MASTER TO when using Gtid. Maybe one can provision new slaves by
stopping and copying an existing one.

When using Gtid, no changes to non-transactional tables are allowed! So
basically all replicated tables must be InnoDB (there are other less severe
restrictions also).

On a Gtid-configured slave, --log-slave-updates is mandatory. I think the
slave binlog is used to recover the set of Gtids already applied on the slave.

The code appears surprisingly unfinished, which makes understanding it
harder. It looks like parts of the initial design was dropped (or postponed?)
at a late state during implementation, and the extra code is still
there. Eg. I found many occurances of stuff like this, in
write_one_empty_group_to_cache():

/*
Apparently this code is not being called. We need to
investigate if this is a bug or this code is not
necessary. /Alfranio
*/
DBUG_ASSERT(0); /NOTREACHED/

See below for some more random examples.

The code maintains a lot of in-memory data structures containing Gtid and
other stuff. I'm still working on understanding what this is used for (and how
much of it is used).

It seems to me that the design is heabily influenced by the requirements of
the MySQL parallel slave feature.

There are a lot of new concepts refered to all over the code - it will be
necessary to first understand what they mean before the code can be grasped.
So I'm making a list of them.

Concepts:

Gtid: a pair of (SID, GNO) - globally unique identifier of a binlog group
(ie. transaction).

SID: Server UUID - 128 bit.

GNO: Group number - 64 bit. This is monotonically increasing. I think it is a
requirement that there be no holes in the sequence - slave will remember
forever any GNO not yet received waiting for it to arrive.

Gtid set: A set of Gtids, used in several places, eg. for the set of all Gtids
applied on a slave server. Such sets are represented as a list of
intervals (or something like that) - since normally a slave will have
applied all Gtids from start to current position (per server id), except
for perhaps a few ones still pending, this should be an efficient
representation.

automatic group: I think this is an event group that is logged to the binlog
with a Gtid with the next GNO in sequence (as opposed to a group logged by
a slave, which preserves the Gtid from the master).

anonymous group: This is used to refer to a group with has no Gtid, so it is
logged just with binlog file name and position.

SIDNO: This seems to be a 32-bit integer used internally to abstract a SID?

sid_map: This seems to be a map maintained to translate between SID and
sid_map. I wonder if the idea was to save memory (32 bit instead of 128)?
There is some indication that this is not used in current code, or only
partially used.

gtid_next: Session variable. AUTOMATIC means allocate the next GNO in
sequence, ANONYMOUS means not use Gtid, else it is the Gtid from the
master set by SQL thread (or mysqldump) to preserve Gtid.

PREVIOUS_GTIDS_LOG_EVENT: Seems like this is logged at the start of every
binlog file. My guess is that it provides the set of all Gtid's that exist
in binlog files prior to this one. This can be used when a slave connects
to know if we can start from this binlog, or need to go back to a previous
binlog, to find all events that the slave needs.

lost group: Seems to be the set of Gtid's in binlogs that were purged.

done group: Set of all gtids in the binary log. There is also a session-scoped
system variable @@gtid_done, which is the set of all groups/Gtids in the
current transaction (not sure how there could be multiple groups in one
transaction though - seems this is some NDB stuff that is
disabled/incomplete).

ANONYMOUS_GTID_LOG_EVENT: The Gtid_log_event constructor creates this if
gtid_next is ANONYMOUS - but gtid_next cannot be set to ANONYMOUS if Gtid
is enabled. So not sure if/how this event can be created.

gtid_next_list: Appears to be disabled and not functional code. It is under
#ifdef HAVE_NDB_BINLOG and has a non-conditional
my_error(ER_NOT_SUPPORTED_YET...). There is this description: "Before
re-executing a transaction that contains multiple Global Transaction
Identifiers, this variable must be set to the set of all re-executed
transactions."

Slave connect: When a slave connects with MASTER_AUTO_POSITION=1, it computes
the set of all Gtid's it has already seen - looks like this is the union
of what exists in the slave binary log, and what exists in the slave relay
logs. It sends this to the master. The master then checks each Gtid if the
slave has received it before, and sends only events that it has not
seen. It looks as if a slave also keeps track of binlog positions so it
can start where it left on reconnect, but will start from scratch on new
CHANGE MASTER, scanning/skipping all binlogs on master until it reaches
the first Gtid not yet seen...

Still to be investigated/understood:

group log: ?

logged group: ?

owned group: ?

super group: ?

HB: ?

group cache: Seems to be a cache of Gtid and the associated binlog
position. Not sure what this is used for yet...

statement group cache: ?

transaction group cache: ?

Slave connect: How does the slave determine where to (re)start replication,
how does it send it to the master, and how does the master determine the
binlog from where to send? And does the master or the slave filter
already-applied events?

Slave-state: How does slave remember (durably) what point in the replication
stream it has reached, so that it can continue on START SLAVE? Is such
storage crash-safe? I think perhaps the slave binlog is used for this (and
--log-slave-updates is mandatory for Gtid slaves). Note that such state is
a set, not a single Gtid - I believe it's basically the list of holes in
the sequence, intervals of GNOs not yet received.

Error handling: What happens if binlog write fails? Seems we cannot easily
roll back at that point, as then an already allocated GNO would leave a
hole in the stream?

-----------------------------------------------------------------------

References:

Blog post by Luís Soares announcing the push to trunk:

http://d2-systems.blogspot.co.uk/2012/04/global-transaction-identifiers-are-in.html

MySQL 5.6 manual:

https://dev.mysql.com/doc/refman/5.6/en/replication-gtids.html

Worklog description (but it mostly says the description is in an attached PDF,
which is apparently missing from the forge):

http://forge.mysql.com/worklog/task.php?id=3584

-----------------------------------------------------------------------

The feature appears incomplete, even though they pushed it to trunk.
Examples:

sys_vars.cc:

#ifdef HAVE_NDB_BINLOG
static bool check_gtid_next_list(sys_var *self, THD *thd, set_var *var)
{
DBUG_ENTER("check_gtid_next_list");
my_error(ER_NOT_SUPPORTED_YET, MYF(0), "GTID_NEXT_LIST");
...

(and it returns ok despite the my_error, which will probably cause assert...)

/*
This code is not being used but we will keep it as it may be
useful if we decide to keeep disable_gtid_unsafe_statements.
*/
#ifdef NON_DISABLED_GTID
static bool check_disable_gtid_unsafe_statements(
...

/*
This code is not being used but we will keep it as it may be
useful when we improve the code around Sys_gtid_mode.
*/
#ifdef NON_DISABLED_GTID
static bool check_gtid_mode(sys_var *self, THD *thd, set_var *var)
{
my_error(ER_NOT_SUPPORTED_YET, MYF(0), "GTID_MODE");
...

Apparently there is some disabled code to facilitate upgrade? That is at least
hinted by this error message and commented-out description:

"The value of GTID_MODE can only change one step at a time: OFF <->
UPGRADE_STEP_1 <~~> UPGRADE_STEP_2 <~~> ON. Also note that this value must
be stepped up or down simultaneously on all servers; see the Manual for
instructions."

binlog.cc:

/*
Apparently this code is not being called. We need to
investigate if this is a bug or this code is not
necessary. /Alfranio
*/
DBUG_ASSERT(0); /NOTREACHED/
#ifdef NON_ERROR_GTID
IO_CACHE *cache= &cache_data->cache_log;
...

rpl_master.cc:

/*
Before going GA, we need to make this protocol extensible without
breaking compatitibilty. /Alfranio.
*/

That's like saying that the protocol will for sure change before 5.6 GA ... :-/

There are many more examples, too numeours to list exhaustively ...

Attachments

Issue Links

blocks

MDEV-26 Global transaction ID

Closed

Activity

Kristian Nielsen added a comment - 2012-06-18 11:44

I think this task is complete. We were supposed to discuss the results, that didn't happen yet, but the results are there for such discussion when appropriate.

Kristian Nielsen added a comment - 2012-06-18 11:44 I think this task is complete. We were supposed to discuss the results, that didn't happen yet, but the results are there for such discussion when appropriate.

MariaDB Server

Research MySQL 5.6 Global transaction IDs

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration