Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4906

stop slave; start slave; increments the GTId on a slave

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.4
    • 10.0.5
    • None

    Description

      With a simple test table, shown here on the slave:

      mysql 15:39:00 [PA DEV 002] [eis] [3202]> select * from mfp;
      +------+
      | c1   |
      +------+
      |   11 |
      |   22 |
      |   33 |
      |   44 |
      |   55 |
      +------+
      5 rows in set (0.00 sec)
       
      mysql 15:39:02 [PA DEV 002] [eis] [3202]> delete from mfp;
      Query OK, 5 rows affected (0.00 sec)

      I then also deleted all these rows on the master simply to explore the options for repairing replication using

      SET GLOBAL gtid_slave_pos = 'XXX'

      However, simply stopping and starting the slave ( sometimes twice ) skipped the transactions with no other actions required:

      mysql 15:39:09 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1412 |
      | gtid_current_pos | 3102-223202-1412 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-409  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:40:24 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.01 sec)
       
      mysql 15:45:21 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-410  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:45:24 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.00 sec)
       
      mysql 15:45:28 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-410  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:45:30 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.01 sec)
       
      mysql 15:45:32 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-411  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:45:34 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.01 sec)
       
      mysql 15:45:36 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-411  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:45:37 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.01 sec)
       
      mysql 15:45:39 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-412  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:45:41 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.00 sec)
       
      mysql 15:45:43 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-412  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:45:44 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.00 sec)
       
      mysql 15:45:47 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-413  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:46:23 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.00 sec)
       
      mysql 15:46:26 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-413  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:46:28 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.01 sec)
       
      mysql 15:46:29 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-414  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:46:31 [PA DEV 002] [eis] [3202]> stop slave; start slave;
      Query OK, 0 rows affected (0.00 sec)
       
      Query OK, 0 rows affected (0.01 sec)
       
      mysql 15:46:39 [PA DEV 002] [eis] [3202]> show variables like "%gtid%";
      +------------------+------------------+
      | Variable_name    | Value            |
      +------------------+------------------+
      | gtid_binlog_pos  | 3102-223202-1413 |
      | gtid_current_pos | 3102-223202-1413 |
      | gtid_domain_id   | 3102             |
      | gtid_seq_no      | 0                |
      | gtid_slave_pos   | 3102-223102-416  |
      | gtid_strict_mode | OFF              |
      +------------------+------------------+
      6 rows in set (0.00 sec)
       
      mysql 15:46:41 [PA DEV 002] [eis] [3202]>

      Attachments

        Activity

          Master

          MariaDB [test]> show variables like '%gtid%';
          +------------------+----------------+
          | Variable_name    | Value          |
          +------------------+----------------+
          | gtid_binlog_pos  | 0-1376396033-1 |
          | gtid_current_pos | 0-1376396033-1 |
          | gtid_domain_id   | 0              |
          | gtid_seq_no      | 0              |
          | gtid_slave_pos   |                |
          | gtid_strict_mode | OFF            |
          +------------------+----------------+
          6 rows in set (0.00 sec)
           
          MariaDB [test]> create table t (a int auto_increment primary key);
          Query OK, 0 rows affected (0.32 sec)
           
          rbensley@rbensley:~$ for i in {1..10}; do mysql -P3000 test -e "insert into t values ();"; done

          Slave

          MariaDB [test]> select count(*) from t;
          +----------+
          | count(*) |
          +----------+
          |       10 |
          +----------+
          1 row in set (0.00 sec)
           
          MariaDB [test]> delete from t where a < 4;
          Query OK, 3 rows affected (0.09 sec)
           
          MariaDB [test]> select count(*) from t;
          +----------+
          | count(*) |
          +----------+
          |        7 |
          +----------+
          1 row in set (0.00 sec)

          Master

          MariaDB [test]> delete from t;
          Query OK, 10 rows affected (0.10 sec)

          The Slave will now show a replication failure:

          Last_SQL_Error: Could not execute Delete_rows event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log-bin.000003, end_log_pos 2070
          Replicate_Ignore_Server_Ids: 
          Master_Server_Id: 1376396033
          Master_SSL_Crl: 
          Master_SSL_Crlpath: 
          Using_Gtid: Slave_Pos

          From the slave, I run "stop slave; start slave;"
          And the GTID replication rolls forward!

          MariaDB [test]> show slave status\G
          *************************** 1. row ***************************
                         Slave_IO_State: Waiting for master to send event
                            Master_Host: 127.0.0.1
                            Master_User: root
                            Master_Port: 3000
                          Connect_Retry: 60
                        Master_Log_File: mysql-log-bin.000003
                    Read_Master_Log_Pos: 8641
                         Relay_Log_File: mysql-relay-bin.000002
                          Relay_Log_Pos: 781
                  Relay_Master_Log_File: mysql-log-bin.000003
                       Slave_IO_Running: Yes
                      Slave_SQL_Running: Yes
                        Replicate_Do_DB: 
                    Replicate_Ignore_DB: 
                     Replicate_Do_Table: 
                 Replicate_Ignore_Table: 
                Replicate_Wild_Do_Table: 
            Replicate_Wild_Ignore_Table: 
                             Last_Errno: 0
                             Last_Error: 
                           Skip_Counter: 0
                    Exec_Master_Log_Pos: 8641
                        Relay_Log_Space: 1078
                        Until_Condition: None
                         Until_Log_File: 
                          Until_Log_Pos: 0
                     Master_SSL_Allowed: No
                     Master_SSL_CA_File: 
                     Master_SSL_CA_Path: 
                        Master_SSL_Cert: 
                      Master_SSL_Cipher: 
                         Master_SSL_Key: 
                  Seconds_Behind_Master: 0
          Master_SSL_Verify_Server_Cert: No
                          Last_IO_Errno: 0
                          Last_IO_Error: 
                         Last_SQL_Errno: 0
                         Last_SQL_Error: 
            Replicate_Ignore_Server_Ids: 
                       Master_Server_Id: 1376396033
                         Master_SSL_Crl: 
                     Master_SSL_Crlpath: 
                             Using_Gtid: Slave_Pos
           

          mathnode Richard Bensley added a comment - Master MariaDB [test]> show variables like '%gtid%'; +------------------+----------------+ | Variable_name | Value | +------------------+----------------+ | gtid_binlog_pos | 0-1376396033-1 | | gtid_current_pos | 0-1376396033-1 | | gtid_domain_id | 0 | | gtid_seq_no | 0 | | gtid_slave_pos | | | gtid_strict_mode | OFF | +------------------+----------------+ 6 rows in set (0.00 sec)   MariaDB [test]> create table t (a int auto_increment primary key); Query OK, 0 rows affected (0.32 sec)   rbensley@rbensley:~$ for i in {1..10}; do mysql -P3000 test -e "insert into t values ();"; done Slave MariaDB [test]> select count(*) from t; +----------+ | count(*) | +----------+ | 10 | +----------+ 1 row in set (0.00 sec)   MariaDB [test]> delete from t where a < 4; Query OK, 3 rows affected (0.09 sec)   MariaDB [test]> select count(*) from t; +----------+ | count(*) | +----------+ | 7 | +----------+ 1 row in set (0.00 sec) Master MariaDB [test]> delete from t; Query OK, 10 rows affected (0.10 sec) The Slave will now show a replication failure: Last_SQL_Error: Could not execute Delete_rows event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log-bin.000003, end_log_pos 2070 Replicate_Ignore_Server_Ids: Master_Server_Id: 1376396033 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos From the slave, I run "stop slave; start slave;" And the GTID replication rolls forward! MariaDB [test]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: root Master_Port: 3000 Connect_Retry: 60 Master_Log_File: mysql-log-bin.000003 Read_Master_Log_Pos: 8641 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 781 Relay_Master_Log_File: mysql-log-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 8641 Relay_Log_Space: 1078 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1376396033 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos  

          Test case:

          --source include/have_innodb.inc
          --source include/master-slave.inc
          --source include/have_binlog_format_row.inc

          --connection slave
          --source include/stop_slave.inc
          set sql_log_bin = 0;
          alter table mysql.gtid_slave_pos engine = InnoDB;
          change master to master_use_gtid = current_pos;
          set sql_log_bin = 1;
          --source include/start_slave.inc

          --connection master
          --echo # [master]
          create table t1 (i int) engine=innodb;
          insert into t1 values (1),(2);

          --sync_slave_with_master
          --echo # [slave]
          -echo #-----------------------------
          --echo # Removing a row manually
          -echo #-----------------------------
          delete from t1 where i = 1;

          --connection master
          -echo #-----------------------------
          --echo # [master]
          --echo # Inserting something else, to raise the GTID
          --echo # and avoid the error on the next slave connection, caused
          --echo # by the non-existing seqno
          -echo #-----------------------------
          insert into t1 values (3),(4);
          insert into t1 values (5),(6);
          delete from t1;
          select @@gtid_binlog_pos;

          --connection slave
          --echo # [slave]
          -echo #-----------------------------
          --echo # Waiting for the last INSERT (it has GTID 0-1-4)
          --echo # and checking that replication aborted due to the missing row
          -echo #-----------------------------
          --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-4'
          --source include/wait_condition.inc

          show variables like 'gtid%';
          query_vertical show slave status;

          -echo #-----------------------------
          --echo # Stopping and starting the slave,
          --echo # waiting till it advanced to the next gtid_slave_pos
          --echo # and checking that it aborted again due to the missing row
          -echo #-----------------------------

          --source include/stop_slave.inc
          start slave;
          --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-5'
          --source include/wait_condition.inc
          show variables like 'gtid%';
          query_vertical show slave status;

          -echo #-----------------------------
          --echo # Stopping and starting the slave again,
          --echo # and checking that now it did NOT abort
          -echo #-----------------------------

          --source include/stop_slave.inc
          --source include/start_slave.inc

          --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-5'
          --source include/wait_condition.inc

          show variables like 'gtid%';
          query_vertical show slave status;

          --connection master
          --echo # [master]
          drop table t1;

          --sync_slave_with_master
          --echo # [slave]
          -echo #-----------------------------
          --echo # And now the slave is willing to replicate further
          -echo #-----------------------------
          --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-6'
          --source include/wait_condition.inc
          show variables like 'gtid%';
          query_vertical show slave status;

          --connection master
          --source include/rpl_end.inc

          elenst Elena Stepanova added a comment - Test case: --source include/have_innodb.inc --source include/master-slave.inc --source include/have_binlog_format_row.inc --connection slave --source include/stop_slave.inc set sql_log_bin = 0; alter table mysql.gtid_slave_pos engine = InnoDB; change master to master_use_gtid = current_pos; set sql_log_bin = 1; --source include/start_slave.inc --connection master --echo # [master] create table t1 (i int) engine=innodb; insert into t1 values (1),(2); --sync_slave_with_master --echo # [slave] - echo # ----------------------------- --echo # Removing a row manually - echo # ----------------------------- delete from t1 where i = 1; --connection master - echo # ----------------------------- --echo # [master] --echo # Inserting something else, to raise the GTID --echo # and avoid the error on the next slave connection, caused --echo # by the non-existing seqno - echo # ----------------------------- insert into t1 values (3),(4); insert into t1 values (5),(6); delete from t1; select @@gtid_binlog_pos; --connection slave --echo # [slave] - echo # ----------------------------- --echo # Waiting for the last INSERT (it has GTID 0-1-4) --echo # and checking that replication aborted due to the missing row - echo # ----------------------------- --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-4' --source include/wait_condition.inc show variables like 'gtid%'; query_vertical show slave status; - echo # ----------------------------- --echo # Stopping and starting the slave, --echo # waiting till it advanced to the next gtid_slave_pos --echo # and checking that it aborted again due to the missing row - echo # ----------------------------- --source include/stop_slave.inc start slave; --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-5' --source include/wait_condition.inc show variables like 'gtid%'; query_vertical show slave status; - echo # ----------------------------- --echo # Stopping and starting the slave again, --echo # and checking that now it did NOT abort - echo # ----------------------------- --source include/stop_slave.inc --source include/start_slave.inc --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-5' --source include/wait_condition.inc show variables like 'gtid%'; query_vertical show slave status; --connection master --echo # [master] drop table t1; --sync_slave_with_master --echo # [slave] - echo # ----------------------------- --echo # And now the slave is willing to replicate further - echo # ----------------------------- --let $wait_condition = SELECT @@gtid_slave_pos = '0-1-6' --source include/wait_condition.inc show variables like 'gtid%'; query_vertical show slave status; --connection master --source include/rpl_end.inc

          Pushed to 10.0-base:

          MDEV-4906: When event apply fails, next SQL thread start errorneously commits the failing GTID to gtid_slave_pos

          When a GTID event is executed, we remember the contained GTID position so that
          when we have applied the entire event group we can commit it to
          gtid_slave_pos.

          However, if the event group fails to apply due to some error and the SQL
          thread aborts, the code did not correctly clear the remembered GTID. Thus,
          when SQL thread was restarted, the old GTID of the failing event group was
          incorrectly updated to gtid_slave_pos when the initial rotate event was
          executed, corrupting the GTID position.

          knielsen Kristian Nielsen added a comment - Pushed to 10.0-base: MDEV-4906 : When event apply fails, next SQL thread start errorneously commits the failing GTID to gtid_slave_pos When a GTID event is executed, we remember the contained GTID position so that when we have applied the entire event group we can commit it to gtid_slave_pos. However, if the event group fails to apply due to some error and the SQL thread aborts, the code did not correctly clear the remembered GTID. Thus, when SQL thread was restarted, the old GTID of the failing event group was incorrectly updated to gtid_slave_pos when the initial rotate event was executed, corrupting the GTID position.

          People

            knielsen Kristian Nielsen
            mpattiso Michael Pattison
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.