[MDEV-4325] Relation between GTID_POS and RESET SLAVE [ALL] / CHANGE MASTER TO - Jira

Details

Type: Technical task
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

The provided test case

starts master=>slave replication from scratch, using gtid_pos=auto;
executes 3 events on master;
waits till slave synchronizes with master;
stops replication;
resets slave and master;
executes a few events on master;
starts master=>slave replication from scratch, using gtid_pos=auto

The slave attempts to start from the 4th event. Depending on the nature of the events and the exact number of the "few" events in the second round, it might result either in a replication failure, or with "fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-1-3, which is not in the master's binlog'", or in silent ignoring of the first events in the new master binlog.

--source include/master-slave.inc

--source include/have_innodb.inc

--source include/have_binlog_format_mixed.inc

--echo ################

--echo # Do it once...

--echo ################

--connection slave

--source include/stop_slave.inc

RESET SLAVE ALL;

--connection master

RESET MASTER;

CREATE TABLE t1 (pk INT PRIMARY KEY);

DROP TABLE t1;

--save_master_pos

--connection slave

eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$MASTER_MYPORT, master_user='root', master_gtid_pos=auto;

--source include/start_slave.inc

--sync_with_master

--echo ################

--echo # Do it twice...

--echo ################

--source include/stop_slave.inc

RESET SLAVE ALL;

--connection master

RESET MASTER;

CREATE TABLE t1 (pk INT PRIMARY KEY);

INSERT INTO t1 VALUES (1);

INSERT INTO t1 VALUES (2);

--save_master_pos

--connection slave

eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$MASTER_MYPORT, master_user='root', master_gtid_pos=auto;

--source include/start_slave.inc

--sync_with_master

revision-id: knielsen@knielsen-hq.org-20130322102628-hxohewmbfyd1wig6

revno: 3538

branch-nick: 10.0-mdev26

Attachments

Issue Links

relates to

MDEV-26 Global transaction ID

Closed

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Kristian Nielsen added a comment - 2013-03-27 12:16

> So, I experimented a bit, trying to abstract myself from implementation
> details and imagine possible user expectations.

Excellent analysis! It helped me a lot to get a better overview of where we
are.

> I start a fresh new pair and configure the slave
> CHANGE MASTER TO master_host= ..., ..., master_gtid_pos='';
> (or master_gtid_pos=auto, it shouldn't matter at this point, right?)

Right.

> For me master_gtid_pos is a parameter which defines the replication position
> – same way master_log_pos and master_log_file did before, so it's quite
> natural to have it in CHANGE MASTER (actually I don't know why I should
> provide it – I don't have to set default values of master_log_file/pos, but
> maybe it's because I need to indicate I want to use GTID now).

Yes, it is to indicate using GTID.

Actually, you do have to set default values of master_log_file/pos in normal
replication, it is a mis-feature that one can omit it. Because if master has
purged any binlogs, you get to start from whatever random position is the
first non-purged file - which will certainly and silently corrupt your
replication.

It is quite deep in the design that GTID state is a global property of the
server, not a per-slave-connection position. This is needed for example for
multi-source. It is possible with MASTER_GTID_POS=AUTO to switch eg. from
having two masters to having a single master that itself replicates from the
original two masters. Do you think it will be possible to explain this to
users, or is it hopelessly complicated and will need to be re-designed
completely?

Now with your analysis, I am thinking that I did this incorrectly with CHANGE
MASTER and GTID. Maybe it should instead be like this:

A new command CHANGE GTID TO "0-1-2". This requires all slaves to be
stopped. It replaces CHANGE MASTER TO MASTER_GTID_POS="0-1-2".

A new command SHOW GTID STATUS, replaces the Gtid_Pos field in SHOW ALL
SLAVES STATUS.

In CHANGE MASTER, one must now do MASTER_USE_GTID=1. This gives an error if
no GTID position is set (either manually with CHANGE GTID, or downloaded
automatically by connecting slave to master with old non-GTID position).

This makes it clear that GTID state is global on the server, separate from any
slave connection configuration. And clear that the individual slave connection
can be using GTID to connect (MASTER_USE_GTID=1) or old style position
(MASTER_USE_GTID=0).

What do you think? I now understand that this is how I meant things to work,
though I never formulated it explicitly like this before.

> RESET SLAVE is supposed to do just that, it is defined as
> "makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start",
> and it used to do just that; but it doesn't anymore.

I just read the documentation, indeed that is what it says. But it's rubbish,
isn't it? Except for toy setups where one keeps all binlogs on the master
forever, it doesn't work. Or am I missing something?

But there is clearly a bug here! RESET SLAVE should remove Using_Gtid, it does
not, shame on me. I've fixed and pushed.

Now, if user does RESET SLAVE and then START SLAVE, things will
"work". Replication will start from the first binlog file on the master,
without using Gtid.

> I stopped slave, dropped t1, ran RESET SLAVE, and started it again, as I'd always done before.
> If the next statement on master does something with t1, my replication will abort (the table doesn't exist), so I will at least know about the problem.

Right, this was a bug, fixed now.

Now, replication will start without using GTID, from the first binlog file on
the master. If some binlogs were purged, the same silent corruption may occur.
If all binlogs were kept on the master, things will be ok, but it will no
longer be using GTID.

I won't say this is good behaviour, but it at least seems consistent with how
it worked before. Or what do you think?

> But if master continues with a different table
> create table t2 (i int)
> and keeps working with it, I might never know that I don't have t1 on slave anymore – until it's too late (master died, binlogs are gone, etc.)

Yeah. I would prefer giving an error in case no position specified, but that
is probably out due to backwards compatibility?

At least, if we can educate the user that GTID state is set separately with
CHANGE GTID, it should be clearer that CHANGE MASTER MASTER_USE_GTID=1 starts
from whatever SHOW GTID STATUS displays, and that RESET MASTER reverst to
MASTER_USE_GTID=0.

> CHANGE MASTER TO master_gtid_pos = '';
>
> That's weird, RESET SLAVE [ALL] is perceived as a reverse command for CHANGE MASTER.

Yes, it is wierd. Just as Gtid_Pos in SHOW ALL SLAVES STATUS is wierd, because
it is not per-slave it is global.

Let me hear your opinion on CHANGE GTID / SHOW GTID STATUS / MASTER_USE_GTID,
and if we agree then I will change implementation to that.

> Now, back to our side for a minute: we have clearly changed semantics of
> RESET SLAVE: earlier it would make slave forget the position, now (with
> GTID) it doesn't. But what does it do, then?

With the above bug fixed, now it sets also Using_Gtid=0.

> It's basically the same as the story for User1, only here I didn't do
> anything bad, I just at some point decided to move my master server to
> another host. Slave is fully synchronized, backups are in place, so I just
> stop replication, shut down master, move the data files (but not binlogs, I
> don't need them) to the new host, start master – effectively, it's the same
> as RESET MASTER. I do RESET SLAVE ALL since I need to forget both the
> position and connection parameters, set up replication again, start slave...

With the above bug fixed, things should work, but you will no longer be using
GTID.

If you add MASTER_GTID_POS=AUTO to the CHANGE MASTER command, you should get
an error that master is missing the GTID requested by the slave. But user
needs to be aware that RESET MASTER (or your above equivalent) is dangerous
with GTID. Because it starts GTID generation from scratch, so now you have
duplicate GTIDs in your system, unless you carefully remove the old ones
everywhere. At least you get an error message in most cases rather than silent
corruption.

Once you see the error and issue CHANGE MASTER TO MASTER_GTID_POS='' (or
CHANGE GTID TO ''), things should work again.

The "recommended" way to do the above would be to copy the binlog files along
also (maybe purge all logs but the latest first). Then there would be no need
for RESET SLAVE, just CHANGE MASTER TO the new host and port, and GTID would
connect automatically at the correct position (that's the whole point of GTID,
to find position automatically on new master, right?). Of course this is
untested, but it should work, I will add a test case for this.

Does that sound ok? Any suggestions for improvement?

> As a User3, I want to create a multi-source setup.
> I configured m1 as
> CHANGE MASTER 'm1' master_host=.., ..., master_gtid_pos=''
> started the slave 'm1', it has been working for a while for now.
> Now I want to add another master. I do exactly the same: I run
> CHANGE MASTER 'm2' ... master_gtid_pos='';

You do not need to specify master_gtid_pos='' in the second CHANGE
MASTER. This will be clearer with the change to CHANGE GTID:

CHANGE GTID TO '';
CHANGE MASTER 'm1' ... master_using_gtid=1;
CHANGE MASTER 'm2' ... master_using_gtid=1;

> One has to learn the hard way, so I fix the data, restart m1, configure m2 with master_gtid_pos=auto (it should work, right?).

Yes.

> Then I become User1 or User2 in regard to one of my slaves. Lets say I want to make m2 start from the beginning. How do I do that?

First, to use multi-source with GTID, you have to setup the two different
masters with different domain ids. Let's say gtid_domain_id=1 for m1 and
gtid_domain_id=2 for m2.

Then you need to get the current GTID state, using SHOW ALL SLAVES STATUS
(SHOW GTID STATUS). Let's say it is "1-10-100,2-11-200".

Now you want to start from the beginning of domain 2 (the domain of m2). So
you need to remove that domain from the state:

CHANGE MASTER TO MASTER_GTID_POS="1-10-100"

(or CHANGE GTID TO "1-10-100").

Alternatively, you can start m2 slave from the start of the m2 binlogs,
without using GTID:

CHANGE MASTER 'm2' TO master_log_file='', master_log_pos=0;

Then it will download the correct gtid position and update it
automatically. Then the next time you change master for m2 you can use
MASTER_GTID_POS=AUTO again. It would be nice if I could implement that one
could ask to connect the first time with old-style position, but then the next
time with GTID.

> It's already sad, but will be even sadder if I have 10 sources, or 20...

Yes, perhaps a bit sad. I did at one point consider that MASTER_GTID_POS would
only change the domains mentioned, and leave all other domains intact. And one
would need to set seq_no to zero to remove a domain
(MASTER_GTID_POS="1-10-100,2-11-0"). But I thought that was too magic, and
users could always specify the full GTID state if they wanted to keep some domains.

Hm, a lot longer reply than I indended. But hopefully we are getting closer to
something that is at least workable, if not as perfect as I had hoped
initially ...

Kristian Nielsen added a comment - 2013-03-27 12:16 > So, I experimented a bit, trying to abstract myself from implementation > details and imagine possible user expectations. Excellent analysis! It helped me a lot to get a better overview of where we are. > I start a fresh new pair and configure the slave > CHANGE MASTER TO master_host= ..., ..., master_gtid_pos=''; > (or master_gtid_pos=auto, it shouldn't matter at this point, right?) Right. > For me master_gtid_pos is a parameter which defines the replication position > – same way master_log_pos and master_log_file did before, so it's quite > natural to have it in CHANGE MASTER (actually I don't know why I should > provide it – I don't have to set default values of master_log_file/pos, but > maybe it's because I need to indicate I want to use GTID now). Yes, it is to indicate using GTID. Actually, you do have to set default values of master_log_file/pos in normal replication, it is a mis-feature that one can omit it. Because if master has purged any binlogs, you get to start from whatever random position is the first non-purged file - which will certainly and silently corrupt your replication. It is quite deep in the design that GTID state is a global property of the server, not a per-slave-connection position. This is needed for example for multi-source. It is possible with MASTER_GTID_POS=AUTO to switch eg. from having two masters to having a single master that itself replicates from the original two masters. Do you think it will be possible to explain this to users, or is it hopelessly complicated and will need to be re-designed completely? Now with your analysis, I am thinking that I did this incorrectly with CHANGE MASTER and GTID. Maybe it should instead be like this: A new command CHANGE GTID TO "0-1-2". This requires all slaves to be stopped. It replaces CHANGE MASTER TO MASTER_GTID_POS="0-1-2". A new command SHOW GTID STATUS, replaces the Gtid_Pos field in SHOW ALL SLAVES STATUS. In CHANGE MASTER, one must now do MASTER_USE_GTID=1. This gives an error if no GTID position is set (either manually with CHANGE GTID, or downloaded automatically by connecting slave to master with old non-GTID position). This makes it clear that GTID state is global on the server, separate from any slave connection configuration. And clear that the individual slave connection can be using GTID to connect (MASTER_USE_GTID=1) or old style position (MASTER_USE_GTID=0). What do you think? I now understand that this is how I meant things to work, though I never formulated it explicitly like this before. > RESET SLAVE is supposed to do just that, it is defined as > "makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start", > and it used to do just that; but it doesn't anymore. I just read the documentation, indeed that is what it says. But it's rubbish, isn't it? Except for toy setups where one keeps all binlogs on the master forever, it doesn't work. Or am I missing something? But there is clearly a bug here! RESET SLAVE should remove Using_Gtid, it does not, shame on me. I've fixed and pushed. Now, if user does RESET SLAVE and then START SLAVE, things will "work". Replication will start from the first binlog file on the master, without using Gtid. > I stopped slave, dropped t1, ran RESET SLAVE, and started it again, as I'd always done before. > If the next statement on master does something with t1, my replication will abort (the table doesn't exist), so I will at least know about the problem. Right, this was a bug, fixed now. Now, replication will start without using GTID, from the first binlog file on the master. If some binlogs were purged, the same silent corruption may occur. If all binlogs were kept on the master, things will be ok, but it will no longer be using GTID. I won't say this is good behaviour, but it at least seems consistent with how it worked before. Or what do you think? > But if master continues with a different table > create table t2 (i int) > and keeps working with it, I might never know that I don't have t1 on slave anymore – until it's too late (master died, binlogs are gone, etc.) Yeah. I would prefer giving an error in case no position specified, but that is probably out due to backwards compatibility? At least, if we can educate the user that GTID state is set separately with CHANGE GTID, it should be clearer that CHANGE MASTER MASTER_USE_GTID=1 starts from whatever SHOW GTID STATUS displays, and that RESET MASTER reverst to MASTER_USE_GTID=0. > CHANGE MASTER TO master_gtid_pos = ''; > > That's weird, RESET SLAVE [ALL] is perceived as a reverse command for CHANGE MASTER. Yes, it is wierd. Just as Gtid_Pos in SHOW ALL SLAVES STATUS is wierd, because it is not per-slave it is global. Let me hear your opinion on CHANGE GTID / SHOW GTID STATUS / MASTER_USE_GTID, and if we agree then I will change implementation to that. > Now, back to our side for a minute: we have clearly changed semantics of > RESET SLAVE: earlier it would make slave forget the position, now (with > GTID) it doesn't. But what does it do, then? With the above bug fixed, now it sets also Using_Gtid=0. > It's basically the same as the story for User1, only here I didn't do > anything bad, I just at some point decided to move my master server to > another host. Slave is fully synchronized, backups are in place, so I just > stop replication, shut down master, move the data files (but not binlogs, I > don't need them) to the new host, start master – effectively, it's the same > as RESET MASTER. I do RESET SLAVE ALL since I need to forget both the > position and connection parameters, set up replication again, start slave... With the above bug fixed, things should work, but you will no longer be using GTID. If you add MASTER_GTID_POS=AUTO to the CHANGE MASTER command, you should get an error that master is missing the GTID requested by the slave. But user needs to be aware that RESET MASTER (or your above equivalent) is dangerous with GTID. Because it starts GTID generation from scratch, so now you have duplicate GTIDs in your system, unless you carefully remove the old ones everywhere. At least you get an error message in most cases rather than silent corruption. Once you see the error and issue CHANGE MASTER TO MASTER_GTID_POS='' (or CHANGE GTID TO ''), things should work again. The "recommended" way to do the above would be to copy the binlog files along also (maybe purge all logs but the latest first). Then there would be no need for RESET SLAVE, just CHANGE MASTER TO the new host and port, and GTID would connect automatically at the correct position (that's the whole point of GTID, to find position automatically on new master, right?). Of course this is untested, but it should work, I will add a test case for this. Does that sound ok? Any suggestions for improvement? > As a User3, I want to create a multi-source setup. > I configured m1 as > CHANGE MASTER 'm1' master_host=.., ..., master_gtid_pos='' > started the slave 'm1', it has been working for a while for now. > Now I want to add another master. I do exactly the same: I run > CHANGE MASTER 'm2' ... master_gtid_pos=''; You do not need to specify master_gtid_pos='' in the second CHANGE MASTER. This will be clearer with the change to CHANGE GTID: CHANGE GTID TO ''; CHANGE MASTER 'm1' ... master_using_gtid=1; CHANGE MASTER 'm2' ... master_using_gtid=1; > One has to learn the hard way, so I fix the data, restart m1, configure m2 with master_gtid_pos=auto (it should work, right?). Yes. > Then I become User1 or User2 in regard to one of my slaves. Lets say I want to make m2 start from the beginning. How do I do that? First, to use multi-source with GTID, you have to setup the two different masters with different domain ids. Let's say gtid_domain_id=1 for m1 and gtid_domain_id=2 for m2. Then you need to get the current GTID state, using SHOW ALL SLAVES STATUS (SHOW GTID STATUS). Let's say it is "1-10-100,2-11-200". Now you want to start from the beginning of domain 2 (the domain of m2). So you need to remove that domain from the state: CHANGE MASTER TO MASTER_GTID_POS="1-10-100" (or CHANGE GTID TO "1-10-100"). Alternatively, you can start m2 slave from the start of the m2 binlogs, without using GTID: CHANGE MASTER 'm2' TO master_log_file='', master_log_pos=0; Then it will download the correct gtid position and update it automatically. Then the next time you change master for m2 you can use MASTER_GTID_POS=AUTO again. It would be nice if I could implement that one could ask to connect the first time with old-style position, but then the next time with GTID. > It's already sad, but will be even sadder if I have 10 sources, or 20... Yes, perhaps a bit sad. I did at one point consider that MASTER_GTID_POS would only change the domains mentioned, and leave all other domains intact. And one would need to set seq_no to zero to remove a domain (MASTER_GTID_POS="1-10-100,2-11-0"). But I thought that was too magic, and users could always specify the full GTID state if they wanted to keep some domains. Hm, a lot longer reply than I indended. But hopefully we are getting closer to something that is at least workable, if not as perfect as I had hoped initially ...

Kristian Nielsen added a comment - 2013-03-27 12:46

> Of course this is untested, but it should work, I will add a test case for
> this.

And of course this did not work. I'm fixing right now.

Kristian.

Kristian Nielsen added a comment - 2013-03-27 12:46 > Of course this is untested, but it should work, I will add a test case for > this. And of course this did not work. I'm fixing right now. Kristian.

Kristian Nielsen added a comment - 2013-03-27 13:30

> > Of course this is untested, but it should work, I will add a test case for
> > this.

> And of course this did not work. I'm fixing right now.

I pushed a fix for this. Test case at the end of rpl_gtid_startpos.test.

Kristian Nielsen added a comment - 2013-03-27 13:30 > > Of course this is untested, but it should work, I will add a test case for > > this. > And of course this did not work. I'm fixing right now. I pushed a fix for this. Test case at the end of rpl_gtid_startpos.test.

Elena Stepanova added a comment - 2013-03-31 23:12

>> Do you think it will be possible to explain this to
>> users, or is it hopelessly complicated and will need to be re-designed
>> completely?

I have no doubt that it will be possible to explain everything to users who are planning to run complicated configurations or workflow (switching servers on regular basis, etc.). I'm more concerned about the part of the user base who run simple straightforward replication, and the most they might do is to promote the slave as a new master in case of a crash. I expect it to be the majority, and want to be sure that we don't make their life harder, and even more so that we don't put them in a situation where they are likely to make a critical mistake just because they do stuff as they used to, while we changed the way things work. I expect this category of users won't read deep into the GTID documentation, exactly because they don't need the complicated setup; they are likely to follow instructions similar to 'First steps' or 'Quick setup'. If we manage to eventually explain the important things in a few words, then we should be fine. I know that so far we are not quite there yet, because even although I'm trying to understand things, I keep making mistakes which could have been fatal for a production environment.

>> Now with your analysis, I am thinking that I did this incorrectly with CHANGE
>> MASTER and GTID. Maybe it should instead be like this:
>>
>> - A new command CHANGE GTID TO "0-1-2". This requires all slaves to be
>> stopped. It replaces CHANGE MASTER TO MASTER_GTID_POS="0-1-2".
>>
>> - A new command SHOW GTID STATUS, replaces the Gtid_Pos field in SHOW ALL
>> SLAVES STATUS.

Do we really need the new syntax? I'd think, if GTID position is a global value, we could just make it a global dynamic variable. Then, SHOW GTID STATUS would also be not needed, since it would only return a single value – we can just as well do SHOW VARIABLES or SELECT @@gtid_position (or whatever it's called). Is there any reason why it wouldn't work?

>> replaces the Gtid_Pos field in SHOW ALL
>> SLAVES STATUS.

I think that showing the value in SHOW ALL SLAVES STATUS doesn't hurt, and maybe even beneficial from the usability perspective, so, if it comes for a low price, it could stay there as well.

>> This makes it clear that GTID state is global on the server, separate from any
>> slave connection configuration. And clear that the individual slave connection
>> can be using GTID to connect (MASTER_USE_GTID=1) or old style position
>> (MASTER_USE_GTID=0).
>>
>> What do you think? I now understand that this is how I meant things to work,
>> though I never formulated it explicitly like this before.

Yes, if my current understanding of how things are meant to work is any close to the truth, the proposed changes sound quite logical.

>> > RESET SLAVE is supposed to do just that, it is defined as
>> > "makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start",
>> > and it used to do just that; but it doesn't anymore.

>> I just read the documentation, indeed that is what it says. But it's rubbish,
>> isn't it?

Possibly, but that's how things used to work, and I'm pretty sure a number of people used it in their own, however tricky, ways – either in "toy" (in fact, just low-traffic) setups, or in conjuction with RESET MASTER (on master), etc. It wouldn't be very kind to make radical changes in the way things are supposed to work, especially because it's not easy to explain on high level why the algorithm has to be different with and without GTID.

>> I won't say this is good behaviour, but it at least seems consistent with how
>> it worked before. Or what do you think?

Yes, I think it's better to keep it consistent with the old behavior for the time being.
I might get back to you regarding this after I have tried it (I didn't check the new version yet).

>> I would prefer giving an error in case no position specified, but that
>> is probably out due to backwards compatibility?

Personally, I don't see a big tragedy in doing RESET SLAVE without providing a master position afterwards. I mean, it seems natural to consider it as a shortcut of master_pos/master_log_file=<start from the beginning of whatever we have>. The absence of massive complaints about slave starting after RESET from a non-zero position due to previously purged master binlogs indirectly confirms that people don't have a problem with this. So, I'd rather keep it as is.

>> At least, if we can educate the user that GTID state is set separately with
>> CHANGE GTID, it should be clearer that CHANGE MASTER MASTER_USE_GTID=1 starts
>> from whatever SHOW GTID STATUS displays, and that RESET MASTER reverst to
>> MASTER_USE_GTID=0.

Right. Although, same way as in the previous note about old-style master position, I wouldn't find it wrong if we considered the "empty" GTID value default and use it if nothing else was previously set; but if you prefer insisting on always setting it, either manually or through automatic discovery, I don't have strong objections against it, either.

>> Just as Gtid_Pos in SHOW ALL SLAVES STATUS is wierd, because
>> it is not per-slave it is global.

Hm.. Actually, I don't see anything weird in showing GTID position in SHOW ALL SLAVES STATUS (as opposed to SHOW SLAVE STATUS), exactly because it's global for all slaves.
(Of course, it becomes somewhat strange that SHOW ALL SLAVES STATUS is not the same as SHOW SLAVE STATUS when we only have one slave, but that's another story).

>> > decided to move my master server to
>> > another host. Slave is fully synchronized, backups are in place, so I just
>> > stop replication, shut down master, move the data files (but not binlogs, I
>> > don't need them) to the new host, start master – effectively, it's the same
>> > as RESET MASTER. I do RESET SLAVE ALL since I need to forget both the
>> > position and connection parameters, set up replication again, start slave...

>> user
>> needs to be aware that RESET MASTER (or your above equivalent) is dangerous
>> with GTID. Because it starts GTID generation from scratch, so now you have
>> duplicate GTIDs in your system, unless you carefully remove the old ones
>> everywhere. At least you get an error message in most cases rather than silent
>> corruption.

>> Once you see the error and issue CHANGE MASTER TO MASTER_GTID_POS='' (or
>> CHANGE GTID TO ''), things should work again.

>> The "recommended" way to do the above would be to copy the binlog files along
>> also (maybe purge all logs but the latest first). Then there would be no need
>> for RESET SLAVE, just CHANGE MASTER TO the new host and port, and GTID would
>> connect automatically at the correct position

That's exactly the case when I'm concerned about owners of simple setups, and how things become somewhat more complicated for them, or at least different.
I don't know what real users do in a situation like that, but if I were one of them, I would do exactly as I described, because it's simpler. This way, I don't need to do any purge on old master, I don't need to move an extra log (which might be quite big), I don't need to remember which exact parameters I must modify in CHANGE MASTER (what if I forget to change host? After RESET SLAVE ALL, I'll get a clear error, without RESET SLAVE ALL the slave will attempt to connect to the old master, and lucky me if there is nothing else running on that host/port at the moment; etc.).

>> > As a User3, I want to create a multi-source setup.
>> > I configured m1 as
>> > CHANGE MASTER 'm1' master_host=.., ..., master_gtid_pos=''
>> > started the slave 'm1', it has been working for a while for now.
>> > Now I want to add another master. I do exactly the same: I run
>> > CHANGE MASTER 'm2' ... master_gtid_pos='';

>> You do not need to specify master_gtid_pos='' in the second CHANGE
>> MASTER. This will be clearer with the change to CHANGE GTID:

>> CHANGE GTID TO '';
>> CHANGE MASTER 'm1' ... master_using_gtid=1;
>> CHANGE MASTER 'm2' ... master_using_gtid=1;

Yes, it's much clearer this way. My point was, I'd expect slaves to be symmetrical, while it was very much not so before.

>> It would be nice if I could implement that one
>> could ask to connect the first time with old-style position, but then the next
>> time with GTID.

Is it difficult to implement? Frankly, I thought that auto means pretty much that... Even more so if we have CHANGE MASTER .. master_using_gtid=1|0, where 1 throws an error when the GTID position is not set; then it would be logical to also have master_using_gtid=auto (or SET GLOBAL gtid = 'auto', whichever is more reasonable from implementation perspective), which would mean that the slave connects with an old-style position, acquires GTID position, sets it, and further connects using it.

>> hopefully we are getting closer to
>> something that is at least workable, if not as perfect as I had hoped
>> initially ...

You never know, maybe it turns out "perfect enough" at the end.. Although, of course, nothing is ever as perfect as we initially hope

Elena Stepanova added a comment - 2013-03-31 23:12 >> Do you think it will be possible to explain this to >> users, or is it hopelessly complicated and will need to be re-designed >> completely? I have no doubt that it will be possible to explain everything to users who are planning to run complicated configurations or workflow (switching servers on regular basis, etc.). I'm more concerned about the part of the user base who run simple straightforward replication, and the most they might do is to promote the slave as a new master in case of a crash. I expect it to be the majority, and want to be sure that we don't make their life harder, and even more so that we don't put them in a situation where they are likely to make a critical mistake just because they do stuff as they used to, while we changed the way things work. I expect this category of users won't read deep into the GTID documentation, exactly because they don't need the complicated setup; they are likely to follow instructions similar to 'First steps' or 'Quick setup'. If we manage to eventually explain the important things in a few words, then we should be fine. I know that so far we are not quite there yet, because even although I'm trying to understand things, I keep making mistakes which could have been fatal for a production environment. >> Now with your analysis, I am thinking that I did this incorrectly with CHANGE >> MASTER and GTID. Maybe it should instead be like this: >> >> - A new command CHANGE GTID TO "0-1-2". This requires all slaves to be >> stopped. It replaces CHANGE MASTER TO MASTER_GTID_POS="0-1-2". >> >> - A new command SHOW GTID STATUS, replaces the Gtid_Pos field in SHOW ALL >> SLAVES STATUS. Do we really need the new syntax? I'd think, if GTID position is a global value, we could just make it a global dynamic variable. Then, SHOW GTID STATUS would also be not needed, since it would only return a single value – we can just as well do SHOW VARIABLES or SELECT @@gtid_position (or whatever it's called). Is there any reason why it wouldn't work? >> replaces the Gtid_Pos field in SHOW ALL >> SLAVES STATUS. I think that showing the value in SHOW ALL SLAVES STATUS doesn't hurt, and maybe even beneficial from the usability perspective, so, if it comes for a low price, it could stay there as well. >> This makes it clear that GTID state is global on the server, separate from any >> slave connection configuration. And clear that the individual slave connection >> can be using GTID to connect (MASTER_USE_GTID=1) or old style position >> (MASTER_USE_GTID=0). >> >> What do you think? I now understand that this is how I meant things to work, >> though I never formulated it explicitly like this before. Yes, if my current understanding of how things are meant to work is any close to the truth, the proposed changes sound quite logical. >> > RESET SLAVE is supposed to do just that, it is defined as >> > "makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start", >> > and it used to do just that; but it doesn't anymore. >> I just read the documentation, indeed that is what it says. But it's rubbish, >> isn't it? Possibly, but that's how things used to work, and I'm pretty sure a number of people used it in their own, however tricky, ways – either in "toy" (in fact, just low-traffic) setups, or in conjuction with RESET MASTER (on master), etc. It wouldn't be very kind to make radical changes in the way things are supposed to work, especially because it's not easy to explain on high level why the algorithm has to be different with and without GTID. >> I won't say this is good behaviour, but it at least seems consistent with how >> it worked before. Or what do you think? Yes, I think it's better to keep it consistent with the old behavior for the time being. I might get back to you regarding this after I have tried it (I didn't check the new version yet). >> I would prefer giving an error in case no position specified, but that >> is probably out due to backwards compatibility? Personally, I don't see a big tragedy in doing RESET SLAVE without providing a master position afterwards. I mean, it seems natural to consider it as a shortcut of master_pos/master_log_file=<start from the beginning of whatever we have>. The absence of massive complaints about slave starting after RESET from a non-zero position due to previously purged master binlogs indirectly confirms that people don't have a problem with this. So, I'd rather keep it as is. >> At least, if we can educate the user that GTID state is set separately with >> CHANGE GTID, it should be clearer that CHANGE MASTER MASTER_USE_GTID=1 starts >> from whatever SHOW GTID STATUS displays, and that RESET MASTER reverst to >> MASTER_USE_GTID=0. Right. Although, same way as in the previous note about old-style master position, I wouldn't find it wrong if we considered the "empty" GTID value default and use it if nothing else was previously set; but if you prefer insisting on always setting it, either manually or through automatic discovery, I don't have strong objections against it, either. >> Just as Gtid_Pos in SHOW ALL SLAVES STATUS is wierd, because >> it is not per-slave it is global. Hm.. Actually, I don't see anything weird in showing GTID position in SHOW ALL SLAVES STATUS (as opposed to SHOW SLAVE STATUS), exactly because it's global for all slaves. (Of course, it becomes somewhat strange that SHOW ALL SLAVES STATUS is not the same as SHOW SLAVE STATUS when we only have one slave, but that's another story). >> > decided to move my master server to >> > another host. Slave is fully synchronized, backups are in place, so I just >> > stop replication, shut down master, move the data files (but not binlogs, I >> > don't need them) to the new host, start master – effectively, it's the same >> > as RESET MASTER. I do RESET SLAVE ALL since I need to forget both the >> > position and connection parameters, set up replication again, start slave... >> user >> needs to be aware that RESET MASTER (or your above equivalent) is dangerous >> with GTID. Because it starts GTID generation from scratch, so now you have >> duplicate GTIDs in your system, unless you carefully remove the old ones >> everywhere. At least you get an error message in most cases rather than silent >> corruption. >> Once you see the error and issue CHANGE MASTER TO MASTER_GTID_POS='' (or >> CHANGE GTID TO ''), things should work again. >> The "recommended" way to do the above would be to copy the binlog files along >> also (maybe purge all logs but the latest first). Then there would be no need >> for RESET SLAVE, just CHANGE MASTER TO the new host and port, and GTID would >> connect automatically at the correct position That's exactly the case when I'm concerned about owners of simple setups, and how things become somewhat more complicated for them, or at least different. I don't know what real users do in a situation like that, but if I were one of them, I would do exactly as I described, because it's simpler . This way, I don't need to do any purge on old master, I don't need to move an extra log (which might be quite big), I don't need to remember which exact parameters I must modify in CHANGE MASTER (what if I forget to change host? After RESET SLAVE ALL, I'll get a clear error, without RESET SLAVE ALL the slave will attempt to connect to the old master, and lucky me if there is nothing else running on that host/port at the moment; etc.). >> > As a User3, I want to create a multi-source setup. >> > I configured m1 as >> > CHANGE MASTER 'm1' master_host=.., ..., master_gtid_pos='' >> > started the slave 'm1', it has been working for a while for now. >> > Now I want to add another master. I do exactly the same: I run >> > CHANGE MASTER 'm2' ... master_gtid_pos=''; >> You do not need to specify master_gtid_pos='' in the second CHANGE >> MASTER. This will be clearer with the change to CHANGE GTID: >> CHANGE GTID TO ''; >> CHANGE MASTER 'm1' ... master_using_gtid=1; >> CHANGE MASTER 'm2' ... master_using_gtid=1; Yes, it's much clearer this way. My point was, I'd expect slaves to be symmetrical, while it was very much not so before. >> It would be nice if I could implement that one >> could ask to connect the first time with old-style position, but then the next >> time with GTID. Is it difficult to implement? Frankly, I thought that auto means pretty much that... Even more so if we have CHANGE MASTER .. master_using_gtid=1|0, where 1 throws an error when the GTID position is not set; then it would be logical to also have master_using_gtid=auto (or SET GLOBAL gtid = 'auto', whichever is more reasonable from implementation perspective), which would mean that the slave connects with an old-style position, acquires GTID position, sets it, and further connects using it. >> hopefully we are getting closer to >> something that is at least workable, if not as perfect as I had hoped >> initially ... You never know, maybe it turns out "perfect enough" at the end.. Although, of course, nothing is ever as perfect as we initially hope

Kristian Nielsen added a comment - 2013-06-07 15:41

I believe all of these issues should be resolved, as well as possible
at least, with the new interface pushed recently (master_use_gtid=
slave_pos|current_pos)

Kristian Nielsen added a comment - 2013-06-07 15:41 I believe all of these issues should be resolved, as well as possible at least, with the new interface pushed recently (master_use_gtid= slave_pos|current_pos)

MariaDB Server

Relation between GTID_POS and RESET SLAVE [ALL] / CHANGE MASTER TO

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration