Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15393

gtid_slave_pos duplicate key errors after mysqldump restore

    XMLWordPrintable

Details

    Description

      *** This is a refined description, the original one follows it ***

      When mysqldump run on mysql system database it generates inserts sql commands
      into mysql.gtid_slave_pos.
      There're a couple of issues with that fact when the script is going to provision
      a new slave instance. That is when mysqldump runs with --dump-slave

      1. without --gtid option the script misses out
      SET @@global.gtid_slave_pos = <value>
      In effect the script, that is to run after the new slave server has been initialized, can not update internal state of slave server objects incl.
      rpl_global_gtid_slave_state.last_sub_id
      When replication begins this value may start off 1 or be derived from a
      pre-existing record of the table to eventually reach
      a possibly greater inserted value of
      mysql.gtid_slave_pos.sub_id
      column which is a reason of a duplicate key error described specifically in a scenario of the original description below.

      2. even with --gtid the SET statement is placed in the script before the mysql.gtid_slave_pos inserts block, so an accumulative effect of replying
      the two on the new slave is similar to p.1 (SET would indeed act as to create a pre-existing record).

      While both effects can also be observed if a "genuine" strongly unrecommended manual insert is done into the table, to my view that merits a separate work in MDEV-34564.

      Clearly a mere relocating of
      SET @@global.gtid_slave_pos = <value>
      onto a position after the insert block sorts out the 2nd mysqldump issue. To tackle the 1st one apparently requires more efforts that it deemed and thus is deferred to MDEV-34564 Specifically It is not sufficient to chose <value> to be @@global.gtid_slave_pos itself as such measure would purge the old content of table that is to risk losing a pre-existing state.

      The aimed output therefore should be like in the following block:

      LOCK TABLES `gtid_slave_pos` WRITE;
      /*!40000 ALTER TABLE `gtid_slave_pos` DISABLE KEYS */;
      INSERT INTO `gtid_slave_pos` VALUES
      (0,66,1,1),
      ...
      (0,85,1,20)
      /*!40000 ALTER TABLE `gtid_slave_pos` ENABLE KEYS */;
      UNLOCK TABLES;
       
      -- without ---gtid
      -- SET GLOBAL gtid_slave_pos = @@global.gtid_slave_pos;
      SET GLOBAL gtid_slave_pos='0-1-20';
      

      The last SET statement ensures that the mysqldump script sets
      rpl_global_gtid_slave_state.last_sub_id
      to the max of mysql.gtid_slave_pos.sub_id in the table and by that rules out duplicate key error possibility.

      *** The original description ***

      When restoring a master backup where, for whatever reason, gtid_slave_pos contains entries (master<->master setup, intermediate slave, former slave promoted to master), and using classic master_log_file/pos to set up replication, a newly set up slave will run for a while, counting up its on "sub_id" from one, until reaching the masters "sub_id" values still present in the restored gtid_slave_pos. At that point replication will stop with a duplicate key error.

      How to reproduce:

      • set up a master, insert a "fake" entry into gtid_slave_pos:

        INSERT INTO mysql.gtid_slave_pos VALUES(0,5,1,23);
      

      • take a master backup

        mysqldump --all-databases --single-transaction --master-data > dump.sql
      

      • set up slave, set up replication basics:

        CHANGE MASTER TO MASTER_HOST='...'
                      , MASTER_USER='...'
                      , MASTER_PASSWORD='...';
      

      • restore dump on slave
      • start slave, check slave status to see that things are running, check mysql.gtid_slave_pos contents. Note the first line inserted by the slave, and the 2nd one originating from the master still being there:

      +-----------+--------+-----------+--------+
      | domain_id | sub_id | server_id | seq_no |
      +-----------+--------+-----------+--------+
      |         0 |      1 |        2 |      5 |
      |         0 |      5 |         1 |     23 |
      +-----------+--------+-----------+--------+
      

      • perform some DDL or DML statements on the master
      • check slave status:

                         Last_Error: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-5' for key 'PRIMARY'
      

      The gtid_slave_pos table is truncated when setting the gtid_slave_pos variable with SET GLOBAL, but is left as is on

      CHANGE MASTER TO MASTER_LOG_FILE='...', MASTER_LOG_POS=...;
      

      Proposed fixes:

      TRUNCATE the table on explicit CHANGE MASTER TO MASTER_LOG_FILE, and move the CHANGE MASTER TO command from beginning to end of mysqldump output

      Attachments

        Issue Links

          Activity

            People

              Elkin Andrei Elkin
              hholzgra Hartmut Holzgraefe
              Votes:
              5 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.