Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
2.2.15
-
None
-
MXS-SPRINT-84
Description
MaxScale's MariaDB Monitor was monitoring a pair of MySQL 5.6 servers replicating between each other in master-master mode. This worked fine for quite some time, until MariaDB Monitor's journal reached its maximum age. At that point, MariaDB monitor seemed to get confused, and it removed the master status from the server:
2019-05-28 13:03:09 warning: (1486424) Journal file was created 3013372 seconds ago. Maximum journal age is 28800 seconds.
|
2019-05-28 13:03:09 warning: (1486424) Removing stale journal file for monitor 'MySQL-Monitor'.
|
2019-05-28 13:03:09 notice : Server changed state: server1[server1.domain.com:3307]: new_slave. [Master, Relay Master, Running] -> [Relay Master, Slave, Running]
|
2019-05-28 13:03:09 error : [mariadbmon] No Master can be determined
|
Replication is working fine between the servers.
server1:
Slave_IO_State: Waiting for master to send event
|
Master_Host: server2
|
Master_User: repl
|
Master_Port: 3307
|
Connect_Retry: 60
|
Master_Log_File: mysql-bin.001923
|
Read_Master_Log_Pos: 940
|
Relay_Log_File: relaylog.000292
|
Relay_Log_Pos: 283
|
Relay_Master_Log_File: mysql-bin.001923
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 940
|
Relay_Log_Space: 449
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 0
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 2717
|
Master_UUID: 38b830b4-c981-11e7-ba2a-1402ec80c464
|
Master_Info_File: /store/database/master.info
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
|
Master_Retry_Count: 86400
|
Master_Bind:
|
Last_IO_Error_Timestamp:
|
Last_SQL_Error_Timestamp:
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Retrieved_Gtid_Set:
|
Executed_Gtid_Set:
|
Auto_Position: 0
|
server2:
Slave_IO_State: Waiting for master to send event
|
Master_Host: server1
|
Master_User: repl
|
Master_Port: 3307
|
Connect_Retry: 60
|
Master_Log_File: mysql-bin.004141
|
Read_Master_Log_Pos: 120
|
Relay_Log_File: relaylog.000465
|
Relay_Log_Pos: 283
|
Relay_Master_Log_File: mysql-bin.004141
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 120
|
Relay_Log_Space: 612
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 0
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 2517
|
Master_UUID: 2da77369-c981-11e7-ba2a-1402ec7e3cac
|
Master_Info_File: /store/database/master.info
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
|
Master_Retry_Count: 86400
|
Master_Bind:
|
Last_IO_Error_Timestamp:
|
Last_SQL_Error_Timestamp:
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Retrieved_Gtid_Set:
|
Executed_Gtid_Set:
|
Auto_Position: 0
|
And MaxScale seems to be monitoring these servers just fine:
MaxScale> list servers
|
Servers.
|
-------------------+-----------------+-------+-------------+--------------------
|
Server | Address | Port | Connections | Status
|
-------------------+-----------------+-------+-------------+--------------------
|
server1 | server1.domain.com | 3307 | -21749 | Relay Master, Slave, Running
|
server2 | server2.domain.com | 3307 | -21781 | Relay Master, Slave, Running
|
-------------------+-----------------+-------+-------------+--------------------
|
MaxScale> show servers
|
Server 0x1e0fc90 (server1)
|
Server: server1.domain.com
|
Status: Relay Master, Slave, Running
|
Protocol: MySQLBackend
|
Port: 3307
|
Server Version: 5.6.42-84.2-log
|
Node Id: 2517
|
Master Id: 2717
|
Last event: new_slave
|
Triggered at: Tue, 28 May 2019 15:15:59 GMT
|
Slave Ids:
|
Repl Depth: 3
|
Number of connections: 21876
|
Current no. of conns: -21749
|
Current no. of operations: 0
|
Number of routed packets: 67897325
|
Persistent pool size: 1
|
Persistent measured pool size: 0
|
Persistent actual size max: 3
|
Persistent pool size limit: 500
|
Persistent max time (secs): 601
|
Connections taken from pool: 1458162
|
Pool availability: 98.52%
|
Server 0x1e0e890 (server2)
|
Server: server2.domain.com
|
Status: Relay Master, Slave, Running
|
Protocol: MySQLBackend
|
Port: 3307
|
Server Version: 5.6.42-84.2-log
|
Node Id: 2717
|
Master Id: 2517
|
Last event: new_slave
|
Triggered at: Tue, 28 May 2019 15:15:46 GMT
|
Slave Ids:
|
Repl Depth: 3
|
Number of connections: 22741
|
Current no. of conns: -21781
|
Current no. of operations: 0
|
Number of routed packets: 3065679
|
Persistent pool size: 52
|
Persistent measured pool size: 0
|
Persistent actual size max: 4
|
Persistent pool size limit: 500
|
Persistent max time (secs): 601
|
Connections taken from pool: 1463353
|
Pool availability: 98.47%
|
MaxScale> list monitors
|
---------------------+---------------------
|
Monitor | Status
|
---------------------+---------------------
|
MySQL-Monitor | Running
|
---------------------+---------------------
|
MaxScale> show monitors
|
Monitor: 0x1e122a0
|
Name: MySQL-Monitor
|
State: Running
|
Times monitored: 3104938
|
Sampling interval: 1000 milliseconds
|
Connect Timeout: 3 seconds
|
Read Timeout: 3 seconds
|
Write Timeout: 3 seconds
|
Connect attempts: 1
|
Monitored servers: [server1.domain.com]:3307, [server2.domain.com]:3307
|
Automatic failover: Enabled
|
Failcount: 5
|
Failover timeout: 90
|
Switchover timeout: 90
|
Automatic rejoin: Disabled
|
Enforce read-only: Disabled
|
MaxScale monitor ID: 0
|
Detect replication lag: Disabled
|
Detect stale master: Enabled
|
|
Server information:
|
-------------------
|
|
Server: server1
|
Server ID: 2517
|
Read only: NO
|
Slave configured: YES
|
Slave IO running: YES
|
Slave SQL running: YES
|
Master ID: 2717
|
Master binlog file: 940
|
Master binlog position: 0
|
|
Server: server2
|
Server ID: 2717
|
Read only: NO
|
Slave configured: YES
|
Slave IO running: YES
|
Slave SQL running: YES
|
Master ID: 2517
|
Master binlog file: 120
|
MaxScale has the following configuration:
[maxscale]
|
threads = auto
|
|
[server1]
|
type = server
|
address = server1.domain.com
|
port = 3307
|
protocol = MySQLBackend
|
persistpoolmax = 500
|
persistmaxtime = 601
|
|
[server2]
|
type = server
|
address = server2.domain.com
|
port = 3307
|
protocol = MySQLBackend
|
persistpoolmax = 500
|
persistmaxtime = 601
|
|
[Read-Write Service]
|
type = service
|
router = readwritesplit
|
servers = server1,server2
|
user = maxscale
|
passwd = password
|
max_slave_connections = 100%
|
|
[Read-Write Listener]
|
type = listener
|
service = Read-Write Service
|
protocol = MySQLClient
|
port = 4307
|
|
[MySQL Monitor]
|
type=monitor
|
module=mysqlmon
|
auto_failover=true
|
servers=server1,server2
|
user=maxscale
|
passwd=password
|
monitor_interval=1000
|
|
[MaxAdmin Service]
|
type = service
|
router = cli
|
|
[MaxAdmin Listener]
|
type = listener
|
service = MaxAdmin Service
|
protocol = maxscaled
|
socket = default
|
It looks like the MariaDB Monitor should have the "multimaster" parameter set for a configuration like this:
https://mariadb.com/kb/en/mariadb-maxscale-22-mariadb-monitor/#multimaster
But why did the server status only break when the journal file reached its maximum age?