Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-1722

Switchover leads to error: "Demotion failed due to an error in updating gtid:s."

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.2
    • Fix Version/s: 2.2.4
    • Component/s: mariadbmon
    • Labels:
      None
    • Sprint:
      MXS-SPRINT-54

      Description

      A switchover attempt can fail with the following error message that doesn't really provide any clues about the problem:

      > maxctrl call command mariadbmon switchover MariaMonitor host2 host1
      Error: Server at localhost:8989 responded with status code 403 to `POST maxscale/modules/mariadbmon/switchover?MariaMonitor&host2&host1`:{
      "errors": [
      {
      "detail": "Demotion failed due to an error in updating gtid:s."
      },
      {
      "detail": "Switchover host1 -> host2 failed."
      }
      ]
      }
      

      The error log doesn't contain any helpful clues either:

      2018-03-19 14:17:08 notice : [mariadbmon] Stopped the monitor MariaMonitor for the duration of switchover.
      2018-03-19 14:17:08 notice : [mariadbmon] Demoting server 'host1'.
      2018-03-19 14:17:08 error : [mariadbmon] Demotion failed due to an error in updating gtid:s.
      2018-03-19 14:17:08 error : [mariadbmon] Switchover host1 -> host2 failed.
      

      Is this a bug, or some kind of legitimate failure? If it is a legitimate failure, then it seems like we could improve the error log message to provide more clues about what may have gone wrong.

      It looks like the error message is printed here:

      https://github.com/mariadb-corporation/MaxScale/blob/2178667245d05802a0a5946b485891bfeff01da0/server/modules/monitor/mariadbmon/mariadbmon.cc#L4119

      But I suspect that the actual failure is occurring somewhere in here:

      https://github.com/mariadb-corporation/MaxScale/blob/2178667245d05802a0a5946b485891bfeff01da0/server/modules/monitor/mariadbmon/mariadbmon.cc#L4016

      It looks like all of the GTID-related variables have sane values.

      host1:

      gtid_binlog_pos	1-20-8672,2-2-14751
      gtid_binlog_state	1-20-8672,2-2-14751
      gtid_current_pos	1-20-8672,2-2-14751,3-3-4
      gtid_domain_id	1
      gtid_ignore_duplicates	OFF
      gtid_pos_auto_engines	
      gtid_slave_pos	1-1-8575,2-2-14751,3-3-4
      gtid_strict_mode	ON
      

      host2:

      gtid_binlog_pos	1-20-8676,2-2-14751
      gtid_binlog_state	1-21-132,1-20-8676,2-2-14751
      gtid_current_pos	1-20-8676,2-2-14751,3-3-4
      gtid_domain_id	1
      gtid_ignore_duplicates	OFF
      gtid_pos_auto_engines	
      gtid_slave_pos	1-20-8676,2-2-14751,3-3-4
      gtid_strict_mode	ON
      

      And SHOW SLAVE STATUS also looks fine:

                     Slave_IO_State: Waiting for master to send event
                        Master_Host: host1
                        Master_User: repl_login
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb_bin.000009
                Read_Master_Log_Pos: 837
                     Relay_Log_File: mariadb_relay.000003
                      Relay_Log_Pos: 1138
              Relay_Master_Log_File: mariadb_bin.000009
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 837
                    Relay_Log_Space: 1850
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 20
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 1-20-8672,2-2-14751,3-3-4
            Replicate_Do_Domain_Ids: 
        Replicate_Ignore_Domain_Ids: 
                      Parallel_Mode: conservative
                          SQL_Delay: 0
                SQL_Remaining_Delay: NULL
            Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
      

        Attachments

          Activity

            People

            Assignee:
            esa.korhonen Esa Korhonen
            Reporter:
            GeoffMontee Geoff Montee
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.