Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
2.2.2
-
None
-
MXS-SPRINT-54
Description
A switchover attempt can fail with the following error message that doesn't really provide any clues about the problem:
> maxctrl call command mariadbmon switchover MariaMonitor host2 host1
|
Error: Server at localhost:8989 responded with status code 403 to `POST maxscale/modules/mariadbmon/switchover?MariaMonitor&host2&host1`:{
|
"errors": [
|
{
|
"detail": "Demotion failed due to an error in updating gtid:s."
|
},
|
{
|
"detail": "Switchover host1 -> host2 failed."
|
}
|
]
|
}
|
The error log doesn't contain any helpful clues either:
2018-03-19 14:17:08 notice : [mariadbmon] Stopped the monitor MariaMonitor for the duration of switchover.
|
2018-03-19 14:17:08 notice : [mariadbmon] Demoting server 'host1'.
|
2018-03-19 14:17:08 error : [mariadbmon] Demotion failed due to an error in updating gtid:s.
|
2018-03-19 14:17:08 error : [mariadbmon] Switchover host1 -> host2 failed.
|
Is this a bug, or some kind of legitimate failure? If it is a legitimate failure, then it seems like we could improve the error log message to provide more clues about what may have gone wrong.
It looks like the error message is printed here:
But I suspect that the actual failure is occurring somewhere in here:
It looks like all of the GTID-related variables have sane values.
host1:
gtid_binlog_pos 1-20-8672,2-2-14751
|
gtid_binlog_state 1-20-8672,2-2-14751
|
gtid_current_pos 1-20-8672,2-2-14751,3-3-4
|
gtid_domain_id 1
|
gtid_ignore_duplicates OFF
|
gtid_pos_auto_engines
|
gtid_slave_pos 1-1-8575,2-2-14751,3-3-4
|
gtid_strict_mode ON
|
host2:
gtid_binlog_pos 1-20-8676,2-2-14751
|
gtid_binlog_state 1-21-132,1-20-8676,2-2-14751
|
gtid_current_pos 1-20-8676,2-2-14751,3-3-4
|
gtid_domain_id 1
|
gtid_ignore_duplicates OFF
|
gtid_pos_auto_engines
|
gtid_slave_pos 1-20-8676,2-2-14751,3-3-4
|
gtid_strict_mode ON
|
And SHOW SLAVE STATUS also looks fine:
Slave_IO_State: Waiting for master to send event
|
Master_Host: host1
|
Master_User: repl_login
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: mariadb_bin.000009
|
Read_Master_Log_Pos: 837
|
Relay_Log_File: mariadb_relay.000003
|
Relay_Log_Pos: 1138
|
Relay_Master_Log_File: mariadb_bin.000009
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 837
|
Relay_Log_Space: 1850
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 0
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 20
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: Slave_Pos
|
Gtid_IO_Pos: 1-20-8672,2-2-14751,3-3-4
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: conservative
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
|