Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26 Global transaction ID
  3. MDEV-4325

Relation between GTID_POS and RESET SLAVE [ALL] / CHANGE MASTER TO

Details

    • Technical task
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      The provided test case

      • starts master=>slave replication from scratch, using gtid_pos=auto;
      • executes 3 events on master;
      • waits till slave synchronizes with master;
      • stops replication;
      • resets slave and master;
      • executes a few events on master;
      • starts master=>slave replication from scratch, using gtid_pos=auto

      The slave attempts to start from the 4th event. Depending on the nature of the events and the exact number of the "few" events in the second round, it might result either in a replication failure, or with "fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-1-3, which is not in the master's binlog'", or in silent ignoring of the first events in the new master binlog.

      --source include/master-slave.inc
      --source include/have_innodb.inc
      --source include/have_binlog_format_mixed.inc
       
      --echo ################
      --echo # Do it once...
      --echo ################
       
      --connection slave
      --source include/stop_slave.inc
      RESET SLAVE ALL;
       
      --connection master
      RESET MASTER;
      CREATE TABLE t1 (pk INT PRIMARY KEY);
      DROP TABLE t1;
      --save_master_pos
       
      --connection slave
      eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$MASTER_MYPORT, master_user='root', master_gtid_pos=auto;
      --source include/start_slave.inc
      --sync_with_master
       
      --echo ################
      --echo # Do it twice...
      --echo ################
       
      --source include/stop_slave.inc
      RESET SLAVE ALL;
       
      --connection master
      RESET MASTER;
      CREATE TABLE t1 (pk INT PRIMARY KEY);
      INSERT INTO t1 VALUES (1);
      INSERT INTO t1 VALUES (2);
      --save_master_pos
       
      --connection slave
      eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$MASTER_MYPORT, master_user='root', master_gtid_pos=auto;
      --source include/start_slave.inc
      --sync_with_master

      revision-id: knielsen@knielsen-hq.org-20130322102628-hxohewmbfyd1wig6
      revno: 3538
      branch-nick: 10.0-mdev26

      Attachments

        Issue Links

          Activity

            > So, I experimented a bit, trying to abstract myself from implementation
            > details and imagine possible user expectations.

            Excellent analysis! It helped me a lot to get a better overview of where we
            are.

            > I start a fresh new pair and configure the slave
            > CHANGE MASTER TO master_host= ..., ..., master_gtid_pos='';
            > (or master_gtid_pos=auto, it shouldn't matter at this point, right?)

            Right.

            > For me master_gtid_pos is a parameter which defines the replication position
            > – same way master_log_pos and master_log_file did before, so it's quite
            > natural to have it in CHANGE MASTER (actually I don't know why I should
            > provide it – I don't have to set default values of master_log_file/pos, but
            > maybe it's because I need to indicate I want to use GTID now).

            Yes, it is to indicate using GTID.

            Actually, you do have to set default values of master_log_file/pos in normal
            replication, it is a mis-feature that one can omit it. Because if master has
            purged any binlogs, you get to start from whatever random position is the
            first non-purged file - which will certainly and silently corrupt your
            replication.

            It is quite deep in the design that GTID state is a global property of the
            server, not a per-slave-connection position. This is needed for example for
            multi-source. It is possible with MASTER_GTID_POS=AUTO to switch eg. from
            having two masters to having a single master that itself replicates from the
            original two masters. Do you think it will be possible to explain this to
            users, or is it hopelessly complicated and will need to be re-designed
            completely?

            Now with your analysis, I am thinking that I did this incorrectly with CHANGE
            MASTER and GTID. Maybe it should instead be like this:

            • A new command CHANGE GTID TO "0-1-2". This requires all slaves to be
              stopped. It replaces CHANGE MASTER TO MASTER_GTID_POS="0-1-2".
            • A new command SHOW GTID STATUS, replaces the Gtid_Pos field in SHOW ALL
              SLAVES STATUS.
            • In CHANGE MASTER, one must now do MASTER_USE_GTID=1. This gives an error if
              no GTID position is set (either manually with CHANGE GTID, or downloaded
              automatically by connecting slave to master with old non-GTID position).

            This makes it clear that GTID state is global on the server, separate from any
            slave connection configuration. And clear that the individual slave connection
            can be using GTID to connect (MASTER_USE_GTID=1) or old style position
            (MASTER_USE_GTID=0).

            What do you think? I now understand that this is how I meant things to work,
            though I never formulated it explicitly like this before.

            > RESET SLAVE is supposed to do just that, it is defined as
            > "makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start",
            > and it used to do just that; but it doesn't anymore.

            I just read the documentation, indeed that is what it says. But it's rubbish,
            isn't it? Except for toy setups where one keeps all binlogs on the master
            forever, it doesn't work. Or am I missing something?

            But there is clearly a bug here! RESET SLAVE should remove Using_Gtid, it does
            not, shame on me. I've fixed and pushed.

            Now, if user does RESET SLAVE and then START SLAVE, things will
            "work". Replication will start from the first binlog file on the master,
            without using Gtid.

            > I stopped slave, dropped t1, ran RESET SLAVE, and started it again, as I'd always done before.
            > If the next statement on master does something with t1, my replication will abort (the table doesn't exist), so I will at least know about the problem.

            Right, this was a bug, fixed now.

            Now, replication will start without using GTID, from the first binlog file on
            the master. If some binlogs were purged, the same silent corruption may occur.
            If all binlogs were kept on the master, things will be ok, but it will no
            longer be using GTID.

            I won't say this is good behaviour, but it at least seems consistent with how
            it worked before. Or what do you think?

            > But if master continues with a different table
            > create table t2 (i int)
            > and keeps working with it, I might never know that I don't have t1 on slave anymore – until it's too late (master died, binlogs are gone, etc.)

            Yeah. I would prefer giving an error in case no position specified, but that
            is probably out due to backwards compatibility?

            At least, if we can educate the user that GTID state is set separately with
            CHANGE GTID, it should be clearer that CHANGE MASTER MASTER_USE_GTID=1 starts
            from whatever SHOW GTID STATUS displays, and that RESET MASTER reverst to
            MASTER_USE_GTID=0.

            > CHANGE MASTER TO master_gtid_pos = '';
            >
            > That's weird, RESET SLAVE [ALL] is perceived as a reverse command for CHANGE MASTER.

            Yes, it is wierd. Just as Gtid_Pos in SHOW ALL SLAVES STATUS is wierd, because
            it is not per-slave it is global.

            Let me hear your opinion on CHANGE GTID / SHOW GTID STATUS / MASTER_USE_GTID,
            and if we agree then I will change implementation to that.

            > Now, back to our side for a minute: we have clearly changed semantics of
            > RESET SLAVE: earlier it would make slave forget the position, now (with
            > GTID) it doesn't. But what does it do, then?

            With the above bug fixed, now it sets also Using_Gtid=0.

            > It's basically the same as the story for User1, only here I didn't do
            > anything bad, I just at some point decided to move my master server to
            > another host. Slave is fully synchronized, backups are in place, so I just
            > stop replication, shut down master, move the data files (but not binlogs, I
            > don't need them) to the new host, start master – effectively, it's the same
            > as RESET MASTER. I do RESET SLAVE ALL since I need to forget both the
            > position and connection parameters, set up replication again, start slave...

            With the above bug fixed, things should work, but you will no longer be using
            GTID.

            If you add MASTER_GTID_POS=AUTO to the CHANGE MASTER command, you should get
            an error that master is missing the GTID requested by the slave. But user
            needs to be aware that RESET MASTER (or your above equivalent) is dangerous
            with GTID. Because it starts GTID generation from scratch, so now you have
            duplicate GTIDs in your system, unless you carefully remove the old ones
            everywhere. At least you get an error message in most cases rather than silent
            corruption.

            Once you see the error and issue CHANGE MASTER TO MASTER_GTID_POS='' (or
            CHANGE GTID TO ''), things should work again.

            The "recommended" way to do the above would be to copy the binlog files along
            also (maybe purge all logs but the latest first). Then there would be no need
            for RESET SLAVE, just CHANGE MASTER TO the new host and port, and GTID would
            connect automatically at the correct position (that's the whole point of GTID,
            to find position automatically on new master, right?). Of course this is
            untested, but it should work, I will add a test case for this.

            Does that sound ok? Any suggestions for improvement?

            > As a User3, I want to create a multi-source setup.
            > I configured m1 as
            > CHANGE MASTER 'm1' master_host=.., ..., master_gtid_pos=''
            > started the slave 'm1', it has been working for a while for now.
            > Now I want to add another master. I do exactly the same: I run
            > CHANGE MASTER 'm2' ... master_gtid_pos='';

            You do not need to specify master_gtid_pos='' in the second CHANGE
            MASTER. This will be clearer with the change to CHANGE GTID:

            CHANGE GTID TO '';
            CHANGE MASTER 'm1' ... master_using_gtid=1;
            CHANGE MASTER 'm2' ... master_using_gtid=1;

            > One has to learn the hard way, so I fix the data, restart m1, configure m2 with master_gtid_pos=auto (it should work, right?).

            Yes.

            > Then I become User1 or User2 in regard to one of my slaves. Lets say I want to make m2 start from the beginning. How do I do that?

            First, to use multi-source with GTID, you have to setup the two different
            masters with different domain ids. Let's say gtid_domain_id=1 for m1 and
            gtid_domain_id=2 for m2.

            Then you need to get the current GTID state, using SHOW ALL SLAVES STATUS
            (SHOW GTID STATUS). Let's say it is "1-10-100,2-11-200".

            Now you want to start from the beginning of domain 2 (the domain of m2). So
            you need to remove that domain from the state:

            CHANGE MASTER TO MASTER_GTID_POS="1-10-100"

            (or CHANGE GTID TO "1-10-100").

            Alternatively, you can start m2 slave from the start of the m2 binlogs,
            without using GTID:

            CHANGE MASTER 'm2' TO master_log_file='', master_log_pos=0;

            Then it will download the correct gtid position and update it
            automatically. Then the next time you change master for m2 you can use
            MASTER_GTID_POS=AUTO again. It would be nice if I could implement that one
            could ask to connect the first time with old-style position, but then the next
            time with GTID.

            > It's already sad, but will be even sadder if I have 10 sources, or 20...

            Yes, perhaps a bit sad. I did at one point consider that MASTER_GTID_POS would
            only change the domains mentioned, and leave all other domains intact. And one
            would need to set seq_no to zero to remove a domain
            (MASTER_GTID_POS="1-10-100,2-11-0"). But I thought that was too magic, and
            users could always specify the full GTID state if they wanted to keep some domains.

            Hm, a lot longer reply than I indended. But hopefully we are getting closer to
            something that is at least workable, if not as perfect as I had hoped
            initially ...

            knielsen Kristian Nielsen added a comment - > So, I experimented a bit, trying to abstract myself from implementation > details and imagine possible user expectations. Excellent analysis! It helped me a lot to get a better overview of where we are. > I start a fresh new pair and configure the slave > CHANGE MASTER TO master_host= ..., ..., master_gtid_pos=''; > (or master_gtid_pos=auto, it shouldn't matter at this point, right?) Right. > For me master_gtid_pos is a parameter which defines the replication position > – same way master_log_pos and master_log_file did before, so it's quite > natural to have it in CHANGE MASTER (actually I don't know why I should > provide it – I don't have to set default values of master_log_file/pos, but > maybe it's because I need to indicate I want to use GTID now). Yes, it is to indicate using GTID. Actually, you do have to set default values of master_log_file/pos in normal replication, it is a mis-feature that one can omit it. Because if master has purged any binlogs, you get to start from whatever random position is the first non-purged file - which will certainly and silently corrupt your replication. It is quite deep in the design that GTID state is a global property of the server, not a per-slave-connection position. This is needed for example for multi-source. It is possible with MASTER_GTID_POS=AUTO to switch eg. from having two masters to having a single master that itself replicates from the original two masters. Do you think it will be possible to explain this to users, or is it hopelessly complicated and will need to be re-designed completely? Now with your analysis, I am thinking that I did this incorrectly with CHANGE MASTER and GTID. Maybe it should instead be like this: A new command CHANGE GTID TO "0-1-2". This requires all slaves to be stopped. It replaces CHANGE MASTER TO MASTER_GTID_POS="0-1-2". A new command SHOW GTID STATUS, replaces the Gtid_Pos field in SHOW ALL SLAVES STATUS. In CHANGE MASTER, one must now do MASTER_USE_GTID=1. This gives an error if no GTID position is set (either manually with CHANGE GTID, or downloaded automatically by connecting slave to master with old non-GTID position). This makes it clear that GTID state is global on the server, separate from any slave connection configuration. And clear that the individual slave connection can be using GTID to connect (MASTER_USE_GTID=1) or old style position (MASTER_USE_GTID=0). What do you think? I now understand that this is how I meant things to work, though I never formulated it explicitly like this before. > RESET SLAVE is supposed to do just that, it is defined as > "makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start", > and it used to do just that; but it doesn't anymore. I just read the documentation, indeed that is what it says. But it's rubbish, isn't it? Except for toy setups where one keeps all binlogs on the master forever, it doesn't work. Or am I missing something? But there is clearly a bug here! RESET SLAVE should remove Using_Gtid, it does not, shame on me. I've fixed and pushed. Now, if user does RESET SLAVE and then START SLAVE, things will "work". Replication will start from the first binlog file on the master, without using Gtid. > I stopped slave, dropped t1, ran RESET SLAVE, and started it again, as I'd always done before. > If the next statement on master does something with t1, my replication will abort (the table doesn't exist), so I will at least know about the problem. Right, this was a bug, fixed now. Now, replication will start without using GTID, from the first binlog file on the master. If some binlogs were purged, the same silent corruption may occur. If all binlogs were kept on the master, things will be ok, but it will no longer be using GTID. I won't say this is good behaviour, but it at least seems consistent with how it worked before. Or what do you think? > But if master continues with a different table > create table t2 (i int) > and keeps working with it, I might never know that I don't have t1 on slave anymore – until it's too late (master died, binlogs are gone, etc.) Yeah. I would prefer giving an error in case no position specified, but that is probably out due to backwards compatibility? At least, if we can educate the user that GTID state is set separately with CHANGE GTID, it should be clearer that CHANGE MASTER MASTER_USE_GTID=1 starts from whatever SHOW GTID STATUS displays, and that RESET MASTER reverst to MASTER_USE_GTID=0. > CHANGE MASTER TO master_gtid_pos = ''; > > That's weird, RESET SLAVE [ALL] is perceived as a reverse command for CHANGE MASTER. Yes, it is wierd. Just as Gtid_Pos in SHOW ALL SLAVES STATUS is wierd, because it is not per-slave it is global. Let me hear your opinion on CHANGE GTID / SHOW GTID STATUS / MASTER_USE_GTID, and if we agree then I will change implementation to that. > Now, back to our side for a minute: we have clearly changed semantics of > RESET SLAVE: earlier it would make slave forget the position, now (with > GTID) it doesn't. But what does it do, then? With the above bug fixed, now it sets also Using_Gtid=0. > It's basically the same as the story for User1, only here I didn't do > anything bad, I just at some point decided to move my master server to > another host. Slave is fully synchronized, backups are in place, so I just > stop replication, shut down master, move the data files (but not binlogs, I > don't need them) to the new host, start master – effectively, it's the same > as RESET MASTER. I do RESET SLAVE ALL since I need to forget both the > position and connection parameters, set up replication again, start slave... With the above bug fixed, things should work, but you will no longer be using GTID. If you add MASTER_GTID_POS=AUTO to the CHANGE MASTER command, you should get an error that master is missing the GTID requested by the slave. But user needs to be aware that RESET MASTER (or your above equivalent) is dangerous with GTID. Because it starts GTID generation from scratch, so now you have duplicate GTIDs in your system, unless you carefully remove the old ones everywhere. At least you get an error message in most cases rather than silent corruption. Once you see the error and issue CHANGE MASTER TO MASTER_GTID_POS='' (or CHANGE GTID TO ''), things should work again. The "recommended" way to do the above would be to copy the binlog files along also (maybe purge all logs but the latest first). Then there would be no need for RESET SLAVE, just CHANGE MASTER TO the new host and port, and GTID would connect automatically at the correct position (that's the whole point of GTID, to find position automatically on new master, right?). Of course this is untested, but it should work, I will add a test case for this. Does that sound ok? Any suggestions for improvement? > As a User3, I want to create a multi-source setup. > I configured m1 as > CHANGE MASTER 'm1' master_host=.., ..., master_gtid_pos='' > started the slave 'm1', it has been working for a while for now. > Now I want to add another master. I do exactly the same: I run > CHANGE MASTER 'm2' ... master_gtid_pos=''; You do not need to specify master_gtid_pos='' in the second CHANGE MASTER. This will be clearer with the change to CHANGE GTID: CHANGE GTID TO ''; CHANGE MASTER 'm1' ... master_using_gtid=1; CHANGE MASTER 'm2' ... master_using_gtid=1; > One has to learn the hard way, so I fix the data, restart m1, configure m2 with master_gtid_pos=auto (it should work, right?). Yes. > Then I become User1 or User2 in regard to one of my slaves. Lets say I want to make m2 start from the beginning. How do I do that? First, to use multi-source with GTID, you have to setup the two different masters with different domain ids. Let's say gtid_domain_id=1 for m1 and gtid_domain_id=2 for m2. Then you need to get the current GTID state, using SHOW ALL SLAVES STATUS (SHOW GTID STATUS). Let's say it is "1-10-100,2-11-200". Now you want to start from the beginning of domain 2 (the domain of m2). So you need to remove that domain from the state: CHANGE MASTER TO MASTER_GTID_POS="1-10-100" (or CHANGE GTID TO "1-10-100"). Alternatively, you can start m2 slave from the start of the m2 binlogs, without using GTID: CHANGE MASTER 'm2' TO master_log_file='', master_log_pos=0; Then it will download the correct gtid position and update it automatically. Then the next time you change master for m2 you can use MASTER_GTID_POS=AUTO again. It would be nice if I could implement that one could ask to connect the first time with old-style position, but then the next time with GTID. > It's already sad, but will be even sadder if I have 10 sources, or 20... Yes, perhaps a bit sad. I did at one point consider that MASTER_GTID_POS would only change the domains mentioned, and leave all other domains intact. And one would need to set seq_no to zero to remove a domain (MASTER_GTID_POS="1-10-100,2-11-0"). But I thought that was too magic, and users could always specify the full GTID state if they wanted to keep some domains. Hm, a lot longer reply than I indended. But hopefully we are getting closer to something that is at least workable, if not as perfect as I had hoped initially ...

            > Of course this is untested, but it should work, I will add a test case for
            > this.

            And of course this did not work. I'm fixing right now.

            • Kristian.
            knielsen Kristian Nielsen added a comment - > Of course this is untested, but it should work, I will add a test case for > this. And of course this did not work. I'm fixing right now. Kristian.

            > > Of course this is untested, but it should work, I will add a test case for
            > > this.

            > And of course this did not work. I'm fixing right now.

            I pushed a fix for this. Test case at the end of rpl_gtid_startpos.test.

            knielsen Kristian Nielsen added a comment - > > Of course this is untested, but it should work, I will add a test case for > > this. > And of course this did not work. I'm fixing right now. I pushed a fix for this. Test case at the end of rpl_gtid_startpos.test.

            >> Do you think it will be possible to explain this to
            >> users, or is it hopelessly complicated and will need to be re-designed
            >> completely?

            I have no doubt that it will be possible to explain everything to users who are planning to run complicated configurations or workflow (switching servers on regular basis, etc.). I'm more concerned about the part of the user base who run simple straightforward replication, and the most they might do is to promote the slave as a new master in case of a crash. I expect it to be the majority, and want to be sure that we don't make their life harder, and even more so that we don't put them in a situation where they are likely to make a critical mistake just because they do stuff as they used to, while we changed the way things work. I expect this category of users won't read deep into the GTID documentation, exactly because they don't need the complicated setup; they are likely to follow instructions similar to 'First steps' or 'Quick setup'. If we manage to eventually explain the important things in a few words, then we should be fine. I know that so far we are not quite there yet, because even although I'm trying to understand things, I keep making mistakes which could have been fatal for a production environment.

            >> Now with your analysis, I am thinking that I did this incorrectly with CHANGE
            >> MASTER and GTID. Maybe it should instead be like this:
            >>
            >> - A new command CHANGE GTID TO "0-1-2". This requires all slaves to be
            >> stopped. It replaces CHANGE MASTER TO MASTER_GTID_POS="0-1-2".
            >>
            >> - A new command SHOW GTID STATUS, replaces the Gtid_Pos field in SHOW ALL
            >> SLAVES STATUS.

            Do we really need the new syntax? I'd think, if GTID position is a global value, we could just make it a global dynamic variable. Then, SHOW GTID STATUS would also be not needed, since it would only return a single value – we can just as well do SHOW VARIABLES or SELECT @@gtid_position (or whatever it's called). Is there any reason why it wouldn't work?

            >> replaces the Gtid_Pos field in SHOW ALL
            >> SLAVES STATUS.

            I think that showing the value in SHOW ALL SLAVES STATUS doesn't hurt, and maybe even beneficial from the usability perspective, so, if it comes for a low price, it could stay there as well.

            >> This makes it clear that GTID state is global on the server, separate from any
            >> slave connection configuration. And clear that the individual slave connection
            >> can be using GTID to connect (MASTER_USE_GTID=1) or old style position
            >> (MASTER_USE_GTID=0).
            >>
            >> What do you think? I now understand that this is how I meant things to work,
            >> though I never formulated it explicitly like this before.

            Yes, if my current understanding of how things are meant to work is any close to the truth, the proposed changes sound quite logical.

            >> > RESET SLAVE is supposed to do just that, it is defined as
            >> > "makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start",
            >> > and it used to do just that; but it doesn't anymore.

            >> I just read the documentation, indeed that is what it says. But it's rubbish,
            >> isn't it?

            Possibly, but that's how things used to work, and I'm pretty sure a number of people used it in their own, however tricky, ways – either in "toy" (in fact, just low-traffic) setups, or in conjuction with RESET MASTER (on master), etc. It wouldn't be very kind to make radical changes in the way things are supposed to work, especially because it's not easy to explain on high level why the algorithm has to be different with and without GTID.

            >> I won't say this is good behaviour, but it at least seems consistent with how
            >> it worked before. Or what do you think?

            Yes, I think it's better to keep it consistent with the old behavior for the time being.
            I might get back to you regarding this after I have tried it (I didn't check the new version yet).

            >> I would prefer giving an error in case no position specified, but that
            >> is probably out due to backwards compatibility?

            Personally, I don't see a big tragedy in doing RESET SLAVE without providing a master position afterwards. I mean, it seems natural to consider it as a shortcut of master_pos/master_log_file=<start from the beginning of whatever we have>. The absence of massive complaints about slave starting after RESET from a non-zero position due to previously purged master binlogs indirectly confirms that people don't have a problem with this. So, I'd rather keep it as is.

            >> At least, if we can educate the user that GTID state is set separately with
            >> CHANGE GTID, it should be clearer that CHANGE MASTER MASTER_USE_GTID=1 starts
            >> from whatever SHOW GTID STATUS displays, and that RESET MASTER reverst to
            >> MASTER_USE_GTID=0.

            Right. Although, same way as in the previous note about old-style master position, I wouldn't find it wrong if we considered the "empty" GTID value default and use it if nothing else was previously set; but if you prefer insisting on always setting it, either manually or through automatic discovery, I don't have strong objections against it, either.

            >> Just as Gtid_Pos in SHOW ALL SLAVES STATUS is wierd, because
            >> it is not per-slave it is global.

            Hm.. Actually, I don't see anything weird in showing GTID position in SHOW ALL SLAVES STATUS (as opposed to SHOW SLAVE STATUS), exactly because it's global for all slaves.
            (Of course, it becomes somewhat strange that SHOW ALL SLAVES STATUS is not the same as SHOW SLAVE STATUS when we only have one slave, but that's another story).

            >> > decided to move my master server to
            >> > another host. Slave is fully synchronized, backups are in place, so I just
            >> > stop replication, shut down master, move the data files (but not binlogs, I
            >> > don't need them) to the new host, start master – effectively, it's the same
            >> > as RESET MASTER. I do RESET SLAVE ALL since I need to forget both the
            >> > position and connection parameters, set up replication again, start slave...

            >> user
            >> needs to be aware that RESET MASTER (or your above equivalent) is dangerous
            >> with GTID. Because it starts GTID generation from scratch, so now you have
            >> duplicate GTIDs in your system, unless you carefully remove the old ones
            >> everywhere. At least you get an error message in most cases rather than silent
            >> corruption.

            >> Once you see the error and issue CHANGE MASTER TO MASTER_GTID_POS='' (or
            >> CHANGE GTID TO ''), things should work again.

            >> The "recommended" way to do the above would be to copy the binlog files along
            >> also (maybe purge all logs but the latest first). Then there would be no need
            >> for RESET SLAVE, just CHANGE MASTER TO the new host and port, and GTID would
            >> connect automatically at the correct position

            That's exactly the case when I'm concerned about owners of simple setups, and how things become somewhat more complicated for them, or at least different.
            I don't know what real users do in a situation like that, but if I were one of them, I would do exactly as I described, because it's simpler. This way, I don't need to do any purge on old master, I don't need to move an extra log (which might be quite big), I don't need to remember which exact parameters I must modify in CHANGE MASTER (what if I forget to change host? After RESET SLAVE ALL, I'll get a clear error, without RESET SLAVE ALL the slave will attempt to connect to the old master, and lucky me if there is nothing else running on that host/port at the moment; etc.).

            >> > As a User3, I want to create a multi-source setup.
            >> > I configured m1 as
            >> > CHANGE MASTER 'm1' master_host=.., ..., master_gtid_pos=''
            >> > started the slave 'm1', it has been working for a while for now.
            >> > Now I want to add another master. I do exactly the same: I run
            >> > CHANGE MASTER 'm2' ... master_gtid_pos='';

            >> You do not need to specify master_gtid_pos='' in the second CHANGE
            >> MASTER. This will be clearer with the change to CHANGE GTID:

            >> CHANGE GTID TO '';
            >> CHANGE MASTER 'm1' ... master_using_gtid=1;
            >> CHANGE MASTER 'm2' ... master_using_gtid=1;

            Yes, it's much clearer this way. My point was, I'd expect slaves to be symmetrical, while it was very much not so before.

            >> It would be nice if I could implement that one
            >> could ask to connect the first time with old-style position, but then the next
            >> time with GTID.

            Is it difficult to implement? Frankly, I thought that auto means pretty much that... Even more so if we have CHANGE MASTER .. master_using_gtid=1|0, where 1 throws an error when the GTID position is not set; then it would be logical to also have master_using_gtid=auto (or SET GLOBAL gtid = 'auto', whichever is more reasonable from implementation perspective), which would mean that the slave connects with an old-style position, acquires GTID position, sets it, and further connects using it.

            >> hopefully we are getting closer to
            >> something that is at least workable, if not as perfect as I had hoped
            >> initially ...

            You never know, maybe it turns out "perfect enough" at the end.. Although, of course, nothing is ever as perfect as we initially hope

            elenst Elena Stepanova added a comment - >> Do you think it will be possible to explain this to >> users, or is it hopelessly complicated and will need to be re-designed >> completely? I have no doubt that it will be possible to explain everything to users who are planning to run complicated configurations or workflow (switching servers on regular basis, etc.). I'm more concerned about the part of the user base who run simple straightforward replication, and the most they might do is to promote the slave as a new master in case of a crash. I expect it to be the majority, and want to be sure that we don't make their life harder, and even more so that we don't put them in a situation where they are likely to make a critical mistake just because they do stuff as they used to, while we changed the way things work. I expect this category of users won't read deep into the GTID documentation, exactly because they don't need the complicated setup; they are likely to follow instructions similar to 'First steps' or 'Quick setup'. If we manage to eventually explain the important things in a few words, then we should be fine. I know that so far we are not quite there yet, because even although I'm trying to understand things, I keep making mistakes which could have been fatal for a production environment. >> Now with your analysis, I am thinking that I did this incorrectly with CHANGE >> MASTER and GTID. Maybe it should instead be like this: >> >> - A new command CHANGE GTID TO "0-1-2". This requires all slaves to be >> stopped. It replaces CHANGE MASTER TO MASTER_GTID_POS="0-1-2". >> >> - A new command SHOW GTID STATUS, replaces the Gtid_Pos field in SHOW ALL >> SLAVES STATUS. Do we really need the new syntax? I'd think, if GTID position is a global value, we could just make it a global dynamic variable. Then, SHOW GTID STATUS would also be not needed, since it would only return a single value – we can just as well do SHOW VARIABLES or SELECT @@gtid_position (or whatever it's called). Is there any reason why it wouldn't work? >> replaces the Gtid_Pos field in SHOW ALL >> SLAVES STATUS. I think that showing the value in SHOW ALL SLAVES STATUS doesn't hurt, and maybe even beneficial from the usability perspective, so, if it comes for a low price, it could stay there as well. >> This makes it clear that GTID state is global on the server, separate from any >> slave connection configuration. And clear that the individual slave connection >> can be using GTID to connect (MASTER_USE_GTID=1) or old style position >> (MASTER_USE_GTID=0). >> >> What do you think? I now understand that this is how I meant things to work, >> though I never formulated it explicitly like this before. Yes, if my current understanding of how things are meant to work is any close to the truth, the proposed changes sound quite logical. >> > RESET SLAVE is supposed to do just that, it is defined as >> > "makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start", >> > and it used to do just that; but it doesn't anymore. >> I just read the documentation, indeed that is what it says. But it's rubbish, >> isn't it? Possibly, but that's how things used to work, and I'm pretty sure a number of people used it in their own, however tricky, ways – either in "toy" (in fact, just low-traffic) setups, or in conjuction with RESET MASTER (on master), etc. It wouldn't be very kind to make radical changes in the way things are supposed to work, especially because it's not easy to explain on high level why the algorithm has to be different with and without GTID. >> I won't say this is good behaviour, but it at least seems consistent with how >> it worked before. Or what do you think? Yes, I think it's better to keep it consistent with the old behavior for the time being. I might get back to you regarding this after I have tried it (I didn't check the new version yet). >> I would prefer giving an error in case no position specified, but that >> is probably out due to backwards compatibility? Personally, I don't see a big tragedy in doing RESET SLAVE without providing a master position afterwards. I mean, it seems natural to consider it as a shortcut of master_pos/master_log_file=<start from the beginning of whatever we have>. The absence of massive complaints about slave starting after RESET from a non-zero position due to previously purged master binlogs indirectly confirms that people don't have a problem with this. So, I'd rather keep it as is. >> At least, if we can educate the user that GTID state is set separately with >> CHANGE GTID, it should be clearer that CHANGE MASTER MASTER_USE_GTID=1 starts >> from whatever SHOW GTID STATUS displays, and that RESET MASTER reverst to >> MASTER_USE_GTID=0. Right. Although, same way as in the previous note about old-style master position, I wouldn't find it wrong if we considered the "empty" GTID value default and use it if nothing else was previously set; but if you prefer insisting on always setting it, either manually or through automatic discovery, I don't have strong objections against it, either. >> Just as Gtid_Pos in SHOW ALL SLAVES STATUS is wierd, because >> it is not per-slave it is global. Hm.. Actually, I don't see anything weird in showing GTID position in SHOW ALL SLAVES STATUS (as opposed to SHOW SLAVE STATUS), exactly because it's global for all slaves. (Of course, it becomes somewhat strange that SHOW ALL SLAVES STATUS is not the same as SHOW SLAVE STATUS when we only have one slave, but that's another story). >> > decided to move my master server to >> > another host. Slave is fully synchronized, backups are in place, so I just >> > stop replication, shut down master, move the data files (but not binlogs, I >> > don't need them) to the new host, start master – effectively, it's the same >> > as RESET MASTER. I do RESET SLAVE ALL since I need to forget both the >> > position and connection parameters, set up replication again, start slave... >> user >> needs to be aware that RESET MASTER (or your above equivalent) is dangerous >> with GTID. Because it starts GTID generation from scratch, so now you have >> duplicate GTIDs in your system, unless you carefully remove the old ones >> everywhere. At least you get an error message in most cases rather than silent >> corruption. >> Once you see the error and issue CHANGE MASTER TO MASTER_GTID_POS='' (or >> CHANGE GTID TO ''), things should work again. >> The "recommended" way to do the above would be to copy the binlog files along >> also (maybe purge all logs but the latest first). Then there would be no need >> for RESET SLAVE, just CHANGE MASTER TO the new host and port, and GTID would >> connect automatically at the correct position That's exactly the case when I'm concerned about owners of simple setups, and how things become somewhat more complicated for them, or at least different. I don't know what real users do in a situation like that, but if I were one of them, I would do exactly as I described, because it's simpler . This way, I don't need to do any purge on old master, I don't need to move an extra log (which might be quite big), I don't need to remember which exact parameters I must modify in CHANGE MASTER (what if I forget to change host? After RESET SLAVE ALL, I'll get a clear error, without RESET SLAVE ALL the slave will attempt to connect to the old master, and lucky me if there is nothing else running on that host/port at the moment; etc.). >> > As a User3, I want to create a multi-source setup. >> > I configured m1 as >> > CHANGE MASTER 'm1' master_host=.., ..., master_gtid_pos='' >> > started the slave 'm1', it has been working for a while for now. >> > Now I want to add another master. I do exactly the same: I run >> > CHANGE MASTER 'm2' ... master_gtid_pos=''; >> You do not need to specify master_gtid_pos='' in the second CHANGE >> MASTER. This will be clearer with the change to CHANGE GTID: >> CHANGE GTID TO ''; >> CHANGE MASTER 'm1' ... master_using_gtid=1; >> CHANGE MASTER 'm2' ... master_using_gtid=1; Yes, it's much clearer this way. My point was, I'd expect slaves to be symmetrical, while it was very much not so before. >> It would be nice if I could implement that one >> could ask to connect the first time with old-style position, but then the next >> time with GTID. Is it difficult to implement? Frankly, I thought that auto means pretty much that... Even more so if we have CHANGE MASTER .. master_using_gtid=1|0, where 1 throws an error when the GTID position is not set; then it would be logical to also have master_using_gtid=auto (or SET GLOBAL gtid = 'auto', whichever is more reasonable from implementation perspective), which would mean that the slave connects with an old-style position, acquires GTID position, sets it, and further connects using it. >> hopefully we are getting closer to >> something that is at least workable, if not as perfect as I had hoped >> initially ... You never know, maybe it turns out "perfect enough" at the end.. Although, of course, nothing is ever as perfect as we initially hope

            I believe all of these issues should be resolved, as well as possible
            at least, with the new interface pushed recently (master_use_gtid=
            slave_pos|current_pos)

            knielsen Kristian Nielsen added a comment - I believe all of these issues should be resolved, as well as possible at least, with the new interface pushed recently (master_use_gtid= slave_pos|current_pos)

            People

              knielsen Kristian Nielsen
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.