[MXS-2523] MariaDB Monitor gets confused about server state when stale journal file is removed Created: 2019-05-28  Updated: 2020-08-25  Resolved: 2019-06-14

Status: Closed
Project: MariaDB MaxScale
Component/s: mariadbmon
Affects Version/s: 2.2.15
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Esa Korhonen
Resolution: Won't Fix Votes: 0
Labels: None

Sprint: MXS-SPRINT-84

 Description   

MaxScale's MariaDB Monitor was monitoring a pair of MySQL 5.6 servers replicating between each other in master-master mode. This worked fine for quite some time, until MariaDB Monitor's journal reached its maximum age. At that point, MariaDB monitor seemed to get confused, and it removed the master status from the server:

2019-05-28 13:03:09   warning: (1486424) Journal file was created 3013372 seconds ago. Maximum journal age is 28800 seconds.
2019-05-28 13:03:09   warning: (1486424) Removing stale journal file for monitor 'MySQL-Monitor'.
2019-05-28 13:03:09   notice : Server changed state: server1[server1.domain.com:3307]: new_slave. [Master, Relay Master, Running] -> [Relay Master, Slave, Running]
2019-05-28 13:03:09   error  : [mariadbmon] No Master can be determined

Replication is working fine between the servers.

server1:

               Slave_IO_State: Waiting for master to send event
                  Master_Host: server2
                  Master_User: repl
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: mysql-bin.001923
          Read_Master_Log_Pos: 940
               Relay_Log_File: relaylog.000292
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.001923
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 940
              Relay_Log_Space: 449
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2717
                  Master_UUID: 38b830b4-c981-11e7-ba2a-1402ec80c464
             Master_Info_File: /store/database/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0

server2:

               Slave_IO_State: Waiting for master to send event
                  Master_Host: server1
                  Master_User: repl
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: mysql-bin.004141
          Read_Master_Log_Pos: 120
               Relay_Log_File: relaylog.000465
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.004141
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 120
              Relay_Log_Space: 612
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2517
                  Master_UUID: 2da77369-c981-11e7-ba2a-1402ec7e3cac
             Master_Info_File: /store/database/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0

And MaxScale seems to be monitoring these servers just fine:

MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1            | server1.domain.com |  3307 |      -21749 | Relay Master, Slave, Running
server2            | server2.domain.com |  3307 |      -21781 | Relay Master, Slave, Running
-------------------+-----------------+-------+-------------+--------------------
MaxScale> show servers
Server 0x1e0fc90 (server1)
        Server:                              server1.domain.com
        Status:                              Relay Master, Slave, Running
        Protocol:                            MySQLBackend
        Port:                                3307
        Server Version:                      5.6.42-84.2-log
        Node Id:                             2517
        Master Id:                           2717
        Last event:                          new_slave
        Triggered at:                        Tue, 28 May 2019 15:15:59 GMT
        Slave Ids:
        Repl Depth:                          3
        Number of connections:               21876
        Current no. of conns:                -21749
        Current no. of operations:           0
        Number of routed packets:            67897325
        Persistent pool size:                1
        Persistent measured pool size:       0
        Persistent actual size max:          3
        Persistent pool size limit:          500
        Persistent max time (secs):          601
        Connections taken from pool:         1458162
        Pool availability:                   98.52%
Server 0x1e0e890 (server2)
        Server:                              server2.domain.com
        Status:                              Relay Master, Slave, Running
        Protocol:                            MySQLBackend
        Port:                                3307
        Server Version:                      5.6.42-84.2-log
        Node Id:                             2717
        Master Id:                           2517
        Last event:                          new_slave
        Triggered at:                        Tue, 28 May 2019 15:15:46 GMT
        Slave Ids:
        Repl Depth:                          3
        Number of connections:               22741
        Current no. of conns:                -21781
        Current no. of operations:           0
        Number of routed packets:            3065679
        Persistent pool size:                52
        Persistent measured pool size:       0
        Persistent actual size max:          4
        Persistent pool size limit:          500
        Persistent max time (secs):          601
        Connections taken from pool:         1463353
        Pool availability:                   98.47%
MaxScale> list monitors
---------------------+---------------------
Monitor              | Status
---------------------+---------------------
MySQL-Monitor        | Running
---------------------+---------------------
MaxScale> show monitors
Monitor:                0x1e122a0
Name:                   MySQL-Monitor
State:                  Running
Times monitored:        3104938
Sampling interval:      1000 milliseconds
Connect Timeout:        3 seconds
Read Timeout:           3 seconds
Write Timeout:          3 seconds
Connect attempts:       1
Monitored servers:      [server1.domain.com]:3307, [server2.domain.com]:3307
Automatic failover:     Enabled
Failcount:              5
Failover timeout:       90
Switchover timeout:     90
Automatic rejoin:       Disabled
Enforce read-only:      Disabled
MaxScale monitor ID:    0
Detect replication lag: Disabled
Detect stale master:    Enabled
 
Server information:
-------------------
 
Server:                 server1
Server ID:              2517
Read only:              NO
Slave configured:       YES
Slave IO running:       YES
Slave SQL running:      YES
Master ID:              2717
Master binlog file:     940
Master binlog position: 0
 
Server:                 server2
Server ID:              2717
Read only:              NO
Slave configured:       YES
Slave IO running:       YES
Slave SQL running:      YES
Master ID:              2517
Master binlog file:     120

MaxScale has the following configuration:

[maxscale]
threads = auto
 
[server1]
type = server
address = server1.domain.com
port = 3307
protocol = MySQLBackend
persistpoolmax = 500
persistmaxtime = 601
 
[server2]
type = server
address = server2.domain.com
port = 3307
protocol = MySQLBackend
persistpoolmax = 500
persistmaxtime = 601
 
[Read-Write Service]
type = service
router = readwritesplit
servers = server1,server2
user = maxscale
passwd = password
max_slave_connections = 100%
 
[Read-Write Listener]
type = listener
service = Read-Write Service
protocol = MySQLClient
port = 4307
 
[MySQL Monitor]
type=monitor
module=mysqlmon
auto_failover=true
servers=server1,server2
user=maxscale
passwd=password
monitor_interval=1000
 
[MaxAdmin Service]
type = service
router = cli
 
[MaxAdmin Listener]
type = listener
service = MaxAdmin Service
protocol = maxscaled
socket = default

It looks like the MariaDB Monitor should have the "multimaster" parameter set for a configuration like this:

https://mariadb.com/kb/en/mariadb-maxscale-22-mariadb-monitor/#multimaster

But why did the server status only break when the journal file reached its maximum age?



 Comments   
Comment by Geoff Montee (Inactive) [ 2019-06-14 ]

esa.korhonen said that MaxScale 2.2 is known to have issues with replication topologies like this. multimaster=true should be used in this scenario with MaxScale 2.2.

MaxScale 2.3 is better able with replication topologies like this than MaxScale 2.2. The multimaster option was removed in MaxScale 2.3, because it is not needed in that version.

Generated at Thu Feb 08 04:14:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.