Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2158

Node rejoin fails, if the node was never a slave (but was master before going down)

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 2.3.0
    • 2.3.2
    • mariadbmon
    • None
    • MXS-SPRINT-70

    Description

      (1) Start a Fresh new cluster Server 1 = master, Server2,3,4= Slave
      (2) Bring Master Down (without having done any transactions)
      (3) Server 2 gets promoted to Master
      (4) Perform couple of transactions
      (5) Bring up Server 1

      Server 1 is not joined into the cluster as Slave with following error Message in the log:

      2018-11-12 08:48:42   notice : Server changed state: server1[127.0.0.1:33061]: server_up. [Down] -> [Running]
      2018-11-12 08:48:42   warning: Automatic rejoin was not attempted on server 'server1' even though it is a valid candidate. Will keep retrying with this message suppressed for all servers. Errors: 
      Server 'server1' could not be queried.
      

      MaxCtrl shows this

       maxctrl list servers                                           Mon Nov 12 08:53:40 2018
       
          Server	Address       Port	Connections     State               GTID
          server1     127.0.0.1     33061     0               Running
       
          server2     127.0.0.1     33062     1               Master, Running     0-2-4
       
          server3     127.0.0.1     33063     1               Slave, Running      0-2-4
       
          server4     127.0.0.1     33064     1               Slave, Running      0-2-4
      

      Monitor configuration is as following

      [TheMonitor]
      type=monitor
      module=mariadbmon
      servers=server1,server2,server3,server4
      user=maxuser
      password=maxpwd
      auto_failover=true
      auto_rejoin=true
      
      

      This is the server setting

      MariaDB [test]> SHOW VARIABLES LIKE "rp%sync%";
      +---------------------------------------+--------------+
      | Variable_name                         | Value        |
      +---------------------------------------+--------------+
      | rpl_semi_sync_master_enabled          | OFF          |
      | rpl_semi_sync_master_timeout          | 10000        |
      | rpl_semi_sync_master_trace_level      | 32           |
      | rpl_semi_sync_master_wait_no_slave    | ON           |
      | rpl_semi_sync_master_wait_point       | AFTER_COMMIT |
      | rpl_semi_sync_slave_delay_master      | OFF          |
      | rpl_semi_sync_slave_enabled           | OFF          |
      | rpl_semi_sync_slave_kill_conn_timeout | 5            |
      | rpl_semi_sync_slave_trace_level       | 32           |
      +---------------------------------------+--------------+
      9 rows in set (0.001 sec)
      
      

      Couple of issues here
      (1) The error message is not descriptive enough to explain which query did the server trying to rejoin failed to respond
      (2) The server1 should have been allowed to rejoin

      Attachments

        Activity

          People

            esa.korhonen Esa Korhonen
            dshjoshi Dipti Joshi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.