[MXS-1722] Switchover leads to error: "Demotion failed due to an error in updating gtid:s." Created: 2018-03-19  Updated: 2020-08-25  Resolved: 2018-03-21

Status: Closed
Project: MariaDB MaxScale
Component/s: mariadbmon
Affects Version/s: 2.2.2
Fix Version/s: 2.2.4

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Esa Korhonen
Resolution: Fixed Votes: 1
Labels: None

Sprint: MXS-SPRINT-54

 Description   

A switchover attempt can fail with the following error message that doesn't really provide any clues about the problem:

> maxctrl call command mariadbmon switchover MariaMonitor host2 host1
Error: Server at localhost:8989 responded with status code 403 to `POST maxscale/modules/mariadbmon/switchover?MariaMonitor&host2&host1`:{
"errors": [
{
"detail": "Demotion failed due to an error in updating gtid:s."
},
{
"detail": "Switchover host1 -> host2 failed."
}
]
}

The error log doesn't contain any helpful clues either:

2018-03-19 14:17:08 notice : [mariadbmon] Stopped the monitor MariaMonitor for the duration of switchover.
2018-03-19 14:17:08 notice : [mariadbmon] Demoting server 'host1'.
2018-03-19 14:17:08 error : [mariadbmon] Demotion failed due to an error in updating gtid:s.
2018-03-19 14:17:08 error : [mariadbmon] Switchover host1 -> host2 failed.

Is this a bug, or some kind of legitimate failure? If it is a legitimate failure, then it seems like we could improve the error log message to provide more clues about what may have gone wrong.

It looks like the error message is printed here:

https://github.com/mariadb-corporation/MaxScale/blob/2178667245d05802a0a5946b485891bfeff01da0/server/modules/monitor/mariadbmon/mariadbmon.cc#L4119

But I suspect that the actual failure is occurring somewhere in here:

https://github.com/mariadb-corporation/MaxScale/blob/2178667245d05802a0a5946b485891bfeff01da0/server/modules/monitor/mariadbmon/mariadbmon.cc#L4016

It looks like all of the GTID-related variables have sane values.

host1:

gtid_binlog_pos	1-20-8672,2-2-14751
gtid_binlog_state	1-20-8672,2-2-14751
gtid_current_pos	1-20-8672,2-2-14751,3-3-4
gtid_domain_id	1
gtid_ignore_duplicates	OFF
gtid_pos_auto_engines	
gtid_slave_pos	1-1-8575,2-2-14751,3-3-4
gtid_strict_mode	ON

host2:

gtid_binlog_pos	1-20-8676,2-2-14751
gtid_binlog_state	1-21-132,1-20-8676,2-2-14751
gtid_current_pos	1-20-8676,2-2-14751,3-3-4
gtid_domain_id	1
gtid_ignore_duplicates	OFF
gtid_pos_auto_engines	
gtid_slave_pos	1-20-8676,2-2-14751,3-3-4
gtid_strict_mode	ON

And SHOW SLAVE STATUS also looks fine:

               Slave_IO_State: Waiting for master to send event
                  Master_Host: host1
                  Master_User: repl_login
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb_bin.000009
          Read_Master_Log_Pos: 837
               Relay_Log_File: mariadb_relay.000003
                Relay_Log_Pos: 1138
        Relay_Master_Log_File: mariadb_bin.000009
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 837
              Relay_Log_Space: 1850
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 20
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 1-20-8672,2-2-14751,3-3-4
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it


Generated at Thu Feb 08 04:08:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.