Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15393

gtid_slave_pos duplicate key errors after mysqldump restore

Details

    Description

      *** This is a refined description, the original one follows it ***

      When mysqldump run on mysql system database it generates inserts sql commands
      into mysql.gtid_slave_pos.
      There're a couple of issues with that fact when the script is going to provision
      a new slave instance. That is when mysqldump runs with --dump-slave

      1. without --gtid option the script misses out
      SET @@global.gtid_slave_pos = <value>
      In effect the script, that is to run after the new slave server has been initialized, can not update internal state of slave server objects incl.
      rpl_global_gtid_slave_state.last_sub_id
      When replication begins this value may start off 1 or be derived from a
      pre-existing record of the table to eventually reach
      a possibly greater inserted value of
      mysql.gtid_slave_pos.sub_id
      column which is a reason of a duplicate key error described specifically in a scenario of the original description below.

      2. even with --gtid the SET statement is placed in the script before the mysql.gtid_slave_pos inserts block, so an accumulative effect of replying
      the two on the new slave is similar to p.1 (SET would indeed act as to create a pre-existing record).

      While both effects can also be observed if a "genuine" strongly unrecommended manual insert is done into the table, to my view that merits a separate work in MDEV-34564.

      Clearly a mere relocating of
      SET @@global.gtid_slave_pos = <value>
      onto a position after the insert block sorts out the 2nd mysqldump issue. To tackle the 1st one apparently requires more efforts that it deemed and thus is deferred to MDEV-34564 Specifically It is not sufficient to chose <value> to be @@global.gtid_slave_pos itself as such measure would purge the old content of table that is to risk losing a pre-existing state.

      The aimed output therefore should be like in the following block:

      LOCK TABLES `gtid_slave_pos` WRITE;
      /*!40000 ALTER TABLE `gtid_slave_pos` DISABLE KEYS */;
      INSERT INTO `gtid_slave_pos` VALUES
      (0,66,1,1),
      ...
      (0,85,1,20)
      /*!40000 ALTER TABLE `gtid_slave_pos` ENABLE KEYS */;
      UNLOCK TABLES;
       
      -- without ---gtid
      -- SET GLOBAL gtid_slave_pos = @@global.gtid_slave_pos;
      SET GLOBAL gtid_slave_pos='0-1-20';
      

      The last SET statement ensures that the mysqldump script sets
      rpl_global_gtid_slave_state.last_sub_id
      to the max of mysql.gtid_slave_pos.sub_id in the table and by that rules out duplicate key error possibility.

      *** The original description ***

      When restoring a master backup where, for whatever reason, gtid_slave_pos contains entries (master<->master setup, intermediate slave, former slave promoted to master), and using classic master_log_file/pos to set up replication, a newly set up slave will run for a while, counting up its on "sub_id" from one, until reaching the masters "sub_id" values still present in the restored gtid_slave_pos. At that point replication will stop with a duplicate key error.

      How to reproduce:

      • set up a master, insert a "fake" entry into gtid_slave_pos:

        INSERT INTO mysql.gtid_slave_pos VALUES(0,5,1,23);
      

      • take a master backup

        mysqldump --all-databases --single-transaction --master-data > dump.sql
      

      • set up slave, set up replication basics:

        CHANGE MASTER TO MASTER_HOST='...'
                      , MASTER_USER='...'
                      , MASTER_PASSWORD='...';
      

      • restore dump on slave
      • start slave, check slave status to see that things are running, check mysql.gtid_slave_pos contents. Note the first line inserted by the slave, and the 2nd one originating from the master still being there:

      +-----------+--------+-----------+--------+
      | domain_id | sub_id | server_id | seq_no |
      +-----------+--------+-----------+--------+
      |         0 |      1 |        2 |      5 |
      |         0 |      5 |         1 |     23 |
      +-----------+--------+-----------+--------+
      

      • perform some DDL or DML statements on the master
      • check slave status:

                         Last_Error: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-5' for key 'PRIMARY'
      

      The gtid_slave_pos table is truncated when setting the gtid_slave_pos variable with SET GLOBAL, but is left as is on

      CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=...;
      

      Proposed fixes:

      TRUNCATE the table on explicit CHANGE MASTER TO MASTER_LOG_FILE, and move the CHANGE MASTER TO command from beginning to end of mysqldump output

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment -

            bnestere, knielsen - fyi the description is edited with a detailed analysis and how-to-fix suggestion. Could you please review that before the coding starts.

            Elkin Andrei Elkin added a comment - bnestere , knielsen - fyi the description is edited with a detailed analysis and how-to-fix suggestion. Could you please review that before the coding starts.

            It sounds fine to put the SET GLOBAL gtid_slave_pos to the end of the mysqldump --gtid output.

            It doesn't seem correct to output a SET GLOBAL gtid_slave_pos statement if --gtid is not specified by the user, as that will destroy the contents of @@GLOBAL.gtid_slave_pos, which would be very bad. It would be like including a table/database into the dump that the user did not ask for.

            One could think about including the SET GLOBAL gtid_slave_pos if mysql.gtid_slave_pos is included in the dump. Note that there can be multiple different mysql.gtid_slave_pos* tables in the server, so a match would be needed. Maybe then it's as simple, and more robust, to handle the issue in the MDEV-34564.

            I guess at the root of the problem is that the internal in-memory copy of the table data in the rpl_global_gtid_slave_state hash become out-of-sync with the sql table contents. Maybe at slave start, the code could check the table contents against the hash contents, and resync (with a warning in the error log) in case of differences? But I guess that's the idea in MDEV-34564.

            knielsen Kristian Nielsen added a comment - It sounds fine to put the SET GLOBAL gtid_slave_pos to the end of the mysqldump --gtid output. It doesn't seem correct to output a SET GLOBAL gtid_slave_pos statement if --gtid is not specified by the user, as that will destroy the contents of @@GLOBAL.gtid_slave_pos, which would be very bad. It would be like including a table/database into the dump that the user did not ask for. One could think about including the SET GLOBAL gtid_slave_pos if mysql.gtid_slave_pos is included in the dump. Note that there can be multiple different mysql.gtid_slave_pos* tables in the server, so a match would be needed. Maybe then it's as simple, and more robust, to handle the issue in the MDEV-34564 . I guess at the root of the problem is that the internal in-memory copy of the table data in the rpl_global_gtid_slave_state hash become out-of-sync with the sql table contents. Maybe at slave start, the code could check the table contents against the hash contents, and resync (with a warning in the error log) in case of differences? But I guess that's the idea in MDEV-34564 .
            Elkin Andrei Elkin added a comment - - edited

            Thanks, knielsen!

            I have to agree with that when --gtid not specified, but --dump-slave is and mysql.gtid_slave_pos is in the dump a chance still exists that the pre-existing on the new slave @@global.gtid_slave_pos should remain from some user's point of view. So I am adjusting the description in this part.

            To make it clear the multiple mysql.gtid_slave_pos* is not a concern now. The SET is going to be placed by mysqldump --dump-slave --gtid after all databases/tables are put out. So it and ways to discover desynchronization is left to MDEV-34564.

            Elkin Andrei Elkin added a comment - - edited Thanks, knielsen ! I have to agree with that when --gtid not specified, but --dump-slave is and mysql.gtid_slave_pos is in the dump a chance still exists that the pre-existing on the new slave @@global.gtid_slave_pos should remain from some user's point of view. So I am adjusting the description in this part. To make it clear the multiple mysql.gtid_slave_pos* is not a concern now. The SET is going to be placed by mysqldump --dump-slave --gtid after all databases/tables are put out. So it and ways to discover desynchronization is left to MDEV-34564 .
            Elkin Andrei Elkin added a comment -

            Howdy Brandon!

            Could you please check bundled fixes to this and MDEV-34615, currently in bb-10.5-andrei.
            The patch only attends --gtid mode mysqldump as discussed with knielsen.

            Elkin Andrei Elkin added a comment - Howdy Brandon! Could you please check bundled fixes to this and MDEV-34615 , currently in bb-10.5-andrei. The patch only attends --gtid mode mysqldump as discussed with knielsen .
            Elkin Andrei Elkin added a comment - - edited

            vigneswara.bandi, this bug is fixed for the case of mysqlbinlog --gtid ... more options .... No --gtid coverage can a part of more general MDEV-34564 Saying that with acknowledgement of the requirement of --gtid is somewhat unnatural for a master_use_gtid = no replication setup.

            Elkin Andrei Elkin added a comment - - edited vigneswara.bandi , this bug is fixed for the case of mysqlbinlog --gtid ... more options ... . No --gtid coverage can a part of more general MDEV-34564 Saying that with acknowledgement of the requirement of --gtid is somewhat unnatural for a master_use_gtid = no replication setup .

            People

              Elkin Andrei Elkin
              hholzgra Hartmut Holzgraefe
              Votes:
              5 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.