Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
2.2.1
-
None
-
MXS-SPRINT-54, MXS-SPRINT-55, MXS-SPRINT-56
Description
When a switchover was done under load, the new master appeared to create GTID events that broke replication on the old master due to out-of-order replicated GTID. A hypothesis is that this is possible caused by disabling read-only on the candidate master too early by MaxScale during switchover command processing
[root@ip-172-31-36-235 centos]# maxctrl list servers |
┌─────────┬───────────────┬──────┬─────────────┬─────────────────┐
|
│ Server │ Address │ Port │ Connections │ State │
|
├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
|
│ dbserv1 │ 172.31.41.185 │ 3306 │ 0 │ Slave, Running │ |
├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
|
│ dbserv2 │ 172.31.33.90 │ 3306 │ 0 │ Master, Running │ |
├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
|
│ dbserv3 │ 172.31.34.36 │ 3306 │ 0 │ Slave, Running │ |
├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
|
│ dbserv4 │ 172.31.41.76 │ 3306 │ 0 │ Slave, Running │ |
└─────────┴───────────────┴──────┴─────────────┴─────────────────┘
|
[root@ip-172-31-36-235 centos]# maxctrl call command mariadbmon switchover Rep_Monitor dbserv1 dbserv2 |
OK
|
[root@ip-172-31-36-235 centos]# maxctrl list servers |
┌─────────┬───────────────┬──────┬─────────────┬─────────────────┐
|
│ Server │ Address │ Port │ Connections │ State │
|
├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
|
│ dbserv1 │ 172.31.41.185 │ 3306 │ 0 │ Master, Running │ |
├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
|
│ dbserv2 │ 172.31.33.90 │ 3306 │ 0 │ Running │ |
├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
|
│ dbserv3 │ 172.31.34.36 │ 3306 │ 0 │ Slave, Running │ |
├─────────┼───────────────┼──────┼─────────────┼─────────────────┤
|
│ dbserv4 │ 172.31.41.76 │ 3306 │ 0 │ Slave, Running │ |
└─────────┴───────────────┴──────┴─────────────┴─────────────────┘
|
2018-02-13 19:43:03 notice : [mariadbmon] Stopped the monitor Rep_Monitor for the duration of switchover. |
2018-02-13 19:43:03 notice : [mariadbmon] Demoting server 'dbserv1'. |
2018-02-13 19:43:04 notice : [mariadbmon] Promoting server 'dbserv2' to master. |
2018-02-13 19:43:04 notice : [mariadbmon] Old master 'dbserv1' starting replication from 'dbserv2'. |
2018-02-13 19:43:04 notice : [mariadbmon] Redirecting slaves to new master. |
2018-02-13 19:43:04 notice : [mariadbmon] Slave 'dbserv3' redirected to new master. |
2018-02-13 19:43:04 notice : [mariadbmon] Slave 'dbserv4' redirected to new master. |
2018-02-13 19:43:04 notice : [mariadbmon] Switchover dbserv1 -> dbserv2 performed. |
2018-02-13 19:43:04 notice : Loaded server states from journal file: /var/lib/maxscale/Rep_Monitor/monitor.dat |
2018-02-13 19:43:04 notice : Server changed state: dbserv1[172.31.41.185:3306]: new_slave. [Master, Running] -> [Slave, Running] |
2018-02-13 19:43:04 notice : Server changed state: dbserv2[172.31.33.90:3306]: new_master. [Slave, Running] -> [Master, Running] |
2018-02-13 19:43:04 notice : [mariadbmon] A Master Server is now available: 172.31.33.90:3306 |
2018-02-13 19:44:15 notice : [mariadbmon] Stopped the monitor Rep_Monitor for the duration of switchover. |
2018-02-13 19:44:15 notice : [mariadbmon] Demoting server 'dbserv2'. |
2018-02-13 19:44:15 notice : [mariadbmon] Promoting server 'dbserv1' to master. |
2018-02-13 19:44:15 notice : [mariadbmon] Old master 'dbserv2' starting replication from 'dbserv1'. |
2018-02-13 19:44:15 notice : [mariadbmon] Redirecting slaves to new master. |
2018-02-13 19:44:15 notice : [mariadbmon] Slave 'dbserv3' redirected to new master. |
2018-02-13 19:44:15 notice : [mariadbmon] Slave 'dbserv4' redirected to new master. |
2018-02-13 19:44:15 notice : [mariadbmon] Switchover dbserv2 -> dbserv1 performed. |
2018-02-13 19:44:15 notice : Loaded server states from journal file: /var/lib/maxscale/Rep_Monitor/monitor.dat |
2018-02-13 19:44:15 notice : Server changed state: dbserv1[172.31.41.185:3306]: new_master. [Slave, Running] -> [Master, Running] |
2018-02-13 19:44:15 notice : Server changed state: dbserv2[172.31.33.90:3306]: new_slave. [Master, Running] -> [Slave, Running] |
2018-02-13 19:44:16 notice : Server changed state: dbserv2[172.31.33.90:3306]: lost_slave. [Slave, Running] -> [Running] |
[root@ip-172-31-36-235 centos]# |
172.31.41.185: |
MariaDB [(none)]> show variables like '%gtid%'; |
+------------------------+-----------------------+
|
| Variable_name | Value |
|
+------------------------+-----------------------+
|
| gtid_binlog_pos | 0-1-717942 | |
| gtid_binlog_state | 0-3-717926,0-1-717942 | |
| gtid_current_pos | 0-1-717942 | |
| gtid_domain_id | 0 | |
| gtid_ignore_duplicates | OFF |
|
| gtid_seq_no | 0 | |
| gtid_slave_pos | 0-3-717926 | |
| gtid_strict_mode | ON |
|
| last_gtid | |
|
| wsrep_gtid_domain_id | 0 | |
| wsrep_gtid_mode | OFF |
|
+------------------------+-----------------------+
|
11 rows in set (0.00 sec) |
172.31.33.90: |
MariaDB [(none)]> show variables like '%gtid%'; |
+------------------------+-----------------------+
|
| Variable_name | Value |
|
+------------------------+-----------------------+
|
| gtid_binlog_pos | 0-3-717928 | |
| gtid_binlog_state | 0-1-717927,0-3-717928 | |
| gtid_current_pos | 0-3-717928 | |
| gtid_domain_id | 0 | |
| gtid_ignore_duplicates | OFF |
|
| gtid_seq_no | 0 | |
| gtid_slave_pos | 0-1-717927 | |
| gtid_strict_mode | ON |
|
| last_gtid | |
|
| wsrep_gtid_domain_id | 0 | |
| wsrep_gtid_mode | OFF |
|
+------------------------+-----------------------+
|
11 rows in set (0.00 sec) |
MariaDB [(none)]>
|
Would you still have the output of the SHOW SLAVE STATUS on old master?
Last_SQL_Error: An attempt was made to binlog GTID 0-1-717928 which would create an out-of-order sequence number with existing GTID 0-3-717928, and gtid strict mode is enabled