Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
2.3.0
-
None
-
MXS-SPRINT-70
Description
(1) Start a Fresh new cluster Server 1 = master, Server2,3,4= Slave
(2) Bring Master Down (without having done any transactions)
(3) Server 2 gets promoted to Master
(4) Perform couple of transactions
(5) Bring up Server 1
Server 1 is not joined into the cluster as Slave with following error Message in the log:
2018-11-12 08:48:42 notice : Server changed state: server1[127.0.0.1:33061]: server_up. [Down] -> [Running] |
2018-11-12 08:48:42 warning: Automatic rejoin was not attempted on server 'server1' even though it is a valid candidate. Will keep retrying with this message suppressed for all servers. Errors: |
Server 'server1' could not be queried. |
MaxCtrl shows this
maxctrl list servers Mon Nov 12 08:53:40 2018 |
|
Server Address Port Connections State GTID
|
server1 127.0.0.1 33061 0 Running |
|
server2 127.0.0.1 33062 1 Master, Running 0-2-4 |
|
server3 127.0.0.1 33063 1 Slave, Running 0-2-4 |
|
server4 127.0.0.1 33064 1 Slave, Running 0-2-4 |
Monitor configuration is as following
[TheMonitor]
|
type=monitor
|
module=mariadbmon
|
servers=server1,server2,server3,server4
|
user=maxuser
|
password=maxpwd
|
auto_failover=true |
auto_rejoin=true |
|
This is the server setting
MariaDB [test]> SHOW VARIABLES LIKE "rp%sync%"; |
+---------------------------------------+--------------+
|
| Variable_name | Value |
|
+---------------------------------------+--------------+
|
| rpl_semi_sync_master_enabled | OFF |
|
| rpl_semi_sync_master_timeout | 10000 | |
| rpl_semi_sync_master_trace_level | 32 | |
| rpl_semi_sync_master_wait_no_slave | ON |
|
| rpl_semi_sync_master_wait_point | AFTER_COMMIT |
|
| rpl_semi_sync_slave_delay_master | OFF |
|
| rpl_semi_sync_slave_enabled | OFF |
|
| rpl_semi_sync_slave_kill_conn_timeout | 5 | |
| rpl_semi_sync_slave_trace_level | 32 | |
+---------------------------------------+--------------+
|
9 rows in set (0.001 sec) |
|
Couple of issues here
(1) The error message is not descriptive enough to explain which query did the server trying to rejoin failed to respond
(2) The server1 should have been allowed to rejoin