Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-1668

Switchover under load condition causes out-of-order replicated GTID

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.2.1
    • Fix Version/s: 2.2.4
    • Component/s: mariadbmon
    • Labels:
      None
    • Sprint:
      MXS-SPRINT-54, MXS-SPRINT-55, MXS-SPRINT-56

      Description

      When a switchover was done under load, the new master appeared to create GTID events that broke replication on the old master due to out-of-order replicated GTID. A hypothesis is that this is possible caused by disabling read-only on the candidate master too early by MaxScale during switchover command processing

      [root@ip-172-31-36-235 centos]# maxctrl list servers
      ┌─────────┬───────────────┬──────┬─────────────┬─────────────────┐
      │ Server  │ Address       │ Port │ Connections │ State           │
      ├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
      │ dbserv1 │ 172.31.41.18533060           │ Slave, Running  │
      ├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
      │ dbserv2 │ 172.31.33.9033060           │ Master, Running │
      ├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
      │ dbserv3 │ 172.31.34.3633060           │ Slave, Running  │
      ├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
      │ dbserv4 │ 172.31.41.7633060           │ Slave, Running  │
      └─────────┴───────────────┴──────┴─────────────┴─────────────────┘
      [root@ip-172-31-36-235 centos]# maxctrl call command mariadbmon switchover Rep_Monitor dbserv1 dbserv2
      OK
      [root@ip-172-31-36-235 centos]# maxctrl list servers
      ┌─────────┬───────────────┬──────┬─────────────┬─────────────────┐
      │ Server  │ Address       │ Port │ Connections │ State           │
      ├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
      │ dbserv1 │ 172.31.41.18533060           │ Master, Running │
      ├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
      │ dbserv2 │ 172.31.33.9033060           │ Running         │
      ├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
      │ dbserv3 │ 172.31.34.3633060           │ Slave, Running  │
      ├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
      │ dbserv4 │ 172.31.41.7633060           │ Slave, Running  │
      └─────────┴───────────────┴──────┴─────────────┴─────────────────┘
      2018-02-13 19:43:03   notice : [mariadbmon] Stopped the monitor Rep_Monitor for the duration of switchover.
      2018-02-13 19:43:03   notice : [mariadbmon] Demoting server 'dbserv1'.
      2018-02-13 19:43:04   notice : [mariadbmon] Promoting server 'dbserv2' to master.
      2018-02-13 19:43:04   notice : [mariadbmon] Old master 'dbserv1' starting replication from 'dbserv2'.
      2018-02-13 19:43:04   notice : [mariadbmon] Redirecting slaves to new master.
      2018-02-13 19:43:04   notice : [mariadbmon] Slave 'dbserv3' redirected to new master.
      2018-02-13 19:43:04   notice : [mariadbmon] Slave 'dbserv4' redirected to new master.
      2018-02-13 19:43:04   notice : [mariadbmon] Switchover dbserv1 -> dbserv2 performed.
      2018-02-13 19:43:04   notice : Loaded server states from journal file: /var/lib/maxscale/Rep_Monitor/monitor.dat
      2018-02-13 19:43:04   notice : Server changed state: dbserv1[172.31.41.185:3306]: new_slave. [Master, Running] -> [Slave, Running]
      2018-02-13 19:43:04   notice : Server changed state: dbserv2[172.31.33.90:3306]: new_master. [Slave, Running] -> [Master, Running]
      2018-02-13 19:43:04   notice : [mariadbmon] A Master Server is now available: 172.31.33.90:3306
      2018-02-13 19:44:15   notice : [mariadbmon] Stopped the monitor Rep_Monitor for the duration of switchover.
      2018-02-13 19:44:15   notice : [mariadbmon] Demoting server 'dbserv2'.
      2018-02-13 19:44:15   notice : [mariadbmon] Promoting server 'dbserv1' to master.
      2018-02-13 19:44:15   notice : [mariadbmon] Old master 'dbserv2' starting replication from 'dbserv1'.
      2018-02-13 19:44:15   notice : [mariadbmon] Redirecting slaves to new master.
      2018-02-13 19:44:15   notice : [mariadbmon] Slave 'dbserv3' redirected to new master.
      2018-02-13 19:44:15   notice : [mariadbmon] Slave 'dbserv4' redirected to new master.
      2018-02-13 19:44:15   notice : [mariadbmon] Switchover dbserv2 -> dbserv1 performed.
      2018-02-13 19:44:15   notice : Loaded server states from journal file: /var/lib/maxscale/Rep_Monitor/monitor.dat
      2018-02-13 19:44:15   notice : Server changed state: dbserv1[172.31.41.185:3306]: new_master. [Slave, Running] -> [Master, Running]
      2018-02-13 19:44:15   notice : Server changed state: dbserv2[172.31.33.90:3306]: new_slave. [Master, Running] -> [Slave, Running]
      2018-02-13 19:44:16   notice : Server changed state: dbserv2[172.31.33.90:3306]: lost_slave. [Slave, Running] -> [Running]
      [root@ip-172-31-36-235 centos]# 
      172.31.41.185:
      MariaDB [(none)]> show variables like '%gtid%';
      +------------------------+-----------------------+
      | Variable_name          | Value                 |
      +------------------------+-----------------------+
      | gtid_binlog_pos        | 0-1-717942            |
      | gtid_binlog_state      | 0-3-717926,0-1-717942 |
      | gtid_current_pos       | 0-1-717942            |
      | gtid_domain_id         | 0                     |
      | gtid_ignore_duplicates | OFF                   |
      | gtid_seq_no            | 0                     |
      | gtid_slave_pos         | 0-3-717926            |
      | gtid_strict_mode       | ON                    |
      | last_gtid              |                       |
      | wsrep_gtid_domain_id   | 0                     |
      | wsrep_gtid_mode        | OFF                   |
      +------------------------+-----------------------+
      11 rows in set (0.00 sec)
      172.31.33.90:
      MariaDB [(none)]> show variables like '%gtid%';
      +------------------------+-----------------------+
      | Variable_name          | Value                 |
      +------------------------+-----------------------+
      | gtid_binlog_pos        | 0-3-717928            |
      | gtid_binlog_state      | 0-1-717927,0-3-717928 |
      | gtid_current_pos       | 0-3-717928            |
      | gtid_domain_id         | 0                     |
      | gtid_ignore_duplicates | OFF                   |
      | gtid_seq_no            | 0                     |
      | gtid_slave_pos         | 0-1-717927            |
      | gtid_strict_mode       | ON                    |
      | last_gtid              |                       |
      | wsrep_gtid_domain_id   | 0                     |
      | wsrep_gtid_mode        | OFF                   |
      +------------------------+-----------------------+
      11 rows in set (0.00 sec)
      MariaDB [(none)]> 
      

      Would you still have the output of the SHOW SLAVE STATUS on old master?
      Last_SQL_Error: An attempt was made to binlog GTID 0-1-717928 which would create an out-of-order sequence number with existing GTID 0-3-717928, and gtid strict mode is enabled

        Attachments

          Activity

            People

            Assignee:
            esa.korhonen Esa Korhonen
            Reporter:
            dshjoshi Dipti Joshi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.