[MXS-2158] Node rejoin fails, if the node was never a slave (but was master before going down) Created: 2018-11-12  Updated: 2018-11-20  Resolved: 2018-11-19

Status: Closed
Project: MariaDB MaxScale
Component/s: mariadbmon
Affects Version/s: 2.3.0
Fix Version/s: 2.3.2

Type: Bug Priority: Critical
Reporter: Dipti Joshi (Inactive) Assignee: Esa Korhonen
Resolution: Fixed Votes: 0
Labels: None

Sprint: MXS-SPRINT-70

 Description   

(1) Start a Fresh new cluster Server 1 = master, Server2,3,4= Slave
(2) Bring Master Down (without having done any transactions)
(3) Server 2 gets promoted to Master
(4) Perform couple of transactions
(5) Bring up Server 1

Server 1 is not joined into the cluster as Slave with following error Message in the log:

2018-11-12 08:48:42   notice : Server changed state: server1[127.0.0.1:33061]: server_up. [Down] -> [Running]
2018-11-12 08:48:42   warning: Automatic rejoin was not attempted on server 'server1' even though it is a valid candidate. Will keep retrying with this message suppressed for all servers. Errors: 
Server 'server1' could not be queried.

MaxCtrl shows this

 maxctrl list servers                                           Mon Nov 12 08:53:40 2018
 
    Server	Address       Port	Connections     State               GTID
    server1     127.0.0.1     33061     0               Running
 
    server2     127.0.0.1     33062     1               Master, Running     0-2-4
 
    server3     127.0.0.1     33063     1               Slave, Running      0-2-4
 
    server4     127.0.0.1     33064     1               Slave, Running      0-2-4

Monitor configuration is as following

[TheMonitor]
type=monitor
module=mariadbmon
servers=server1,server2,server3,server4
user=maxuser
password=maxpwd
auto_failover=true
auto_rejoin=true

This is the server setting

MariaDB [test]> SHOW VARIABLES LIKE "rp%sync%";
+---------------------------------------+--------------+
| Variable_name                         | Value        |
+---------------------------------------+--------------+
| rpl_semi_sync_master_enabled          | OFF          |
| rpl_semi_sync_master_timeout          | 10000        |
| rpl_semi_sync_master_trace_level      | 32           |
| rpl_semi_sync_master_wait_no_slave    | ON           |
| rpl_semi_sync_master_wait_point       | AFTER_COMMIT |
| rpl_semi_sync_slave_delay_master      | OFF          |
| rpl_semi_sync_slave_enabled           | OFF          |
| rpl_semi_sync_slave_kill_conn_timeout | 5            |
| rpl_semi_sync_slave_trace_level       | 32           |
+---------------------------------------+--------------+
9 rows in set (0.001 sec)

Couple of issues here
(1) The error message is not descriptive enough to explain which query did the server trying to rejoin failed to respond
(2) The server1 should have been allowed to rejoin


Generated at Thu Feb 08 04:12:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.