Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2523

MariaDB Monitor gets confused about server state when stale journal file is removed

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • 2.2.15
    • N/A
    • mariadbmon
    • None
    • MXS-SPRINT-84

    Description

      MaxScale's MariaDB Monitor was monitoring a pair of MySQL 5.6 servers replicating between each other in master-master mode. This worked fine for quite some time, until MariaDB Monitor's journal reached its maximum age. At that point, MariaDB monitor seemed to get confused, and it removed the master status from the server:

      2019-05-28 13:03:09   warning: (1486424) Journal file was created 3013372 seconds ago. Maximum journal age is 28800 seconds.
      2019-05-28 13:03:09   warning: (1486424) Removing stale journal file for monitor 'MySQL-Monitor'.
      2019-05-28 13:03:09   notice : Server changed state: server1[server1.domain.com:3307]: new_slave. [Master, Relay Master, Running] -> [Relay Master, Slave, Running]
      2019-05-28 13:03:09   error  : [mariadbmon] No Master can be determined
      

      Replication is working fine between the servers.

      server1:

                     Slave_IO_State: Waiting for master to send event
                        Master_Host: server2
                        Master_User: repl
                        Master_Port: 3307
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.001923
                Read_Master_Log_Pos: 940
                     Relay_Log_File: relaylog.000292
                      Relay_Log_Pos: 283
              Relay_Master_Log_File: mysql-bin.001923
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 940
                    Relay_Log_Space: 449
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 2717
                        Master_UUID: 38b830b4-c981-11e7-ba2a-1402ec80c464
                   Master_Info_File: /store/database/master.info
                          SQL_Delay: 0
                SQL_Remaining_Delay: NULL
            Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
                 Master_Retry_Count: 86400
                        Master_Bind: 
            Last_IO_Error_Timestamp: 
           Last_SQL_Error_Timestamp: 
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                 Retrieved_Gtid_Set: 
                  Executed_Gtid_Set: 
                      Auto_Position: 0
      

      server2:

                     Slave_IO_State: Waiting for master to send event
                        Master_Host: server1
                        Master_User: repl
                        Master_Port: 3307
                      Connect_Retry: 60
                    Master_Log_File: mysql-bin.004141
                Read_Master_Log_Pos: 120
                     Relay_Log_File: relaylog.000465
                      Relay_Log_Pos: 283
              Relay_Master_Log_File: mysql-bin.004141
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 120
                    Relay_Log_Space: 612
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error: 
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 2517
                        Master_UUID: 2da77369-c981-11e7-ba2a-1402ec7e3cac
                   Master_Info_File: /store/database/master.info
                          SQL_Delay: 0
                SQL_Remaining_Delay: NULL
            Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
                 Master_Retry_Count: 86400
                        Master_Bind: 
            Last_IO_Error_Timestamp: 
           Last_SQL_Error_Timestamp: 
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                 Retrieved_Gtid_Set: 
                  Executed_Gtid_Set: 
                      Auto_Position: 0
      

      And MaxScale seems to be monitoring these servers just fine:

      MaxScale> list servers
      Servers.
      -------------------+-----------------+-------+-------------+--------------------
      Server             | Address         | Port  | Connections | Status
      -------------------+-----------------+-------+-------------+--------------------
      server1            | server1.domain.com |  3307 |      -21749 | Relay Master, Slave, Running
      server2            | server2.domain.com |  3307 |      -21781 | Relay Master, Slave, Running
      -------------------+-----------------+-------+-------------+--------------------
      MaxScale> show servers
      Server 0x1e0fc90 (server1)
              Server:                              server1.domain.com
              Status:                              Relay Master, Slave, Running
              Protocol:                            MySQLBackend
              Port:                                3307
              Server Version:                      5.6.42-84.2-log
              Node Id:                             2517
              Master Id:                           2717
              Last event:                          new_slave
              Triggered at:                        Tue, 28 May 2019 15:15:59 GMT
              Slave Ids:
              Repl Depth:                          3
              Number of connections:               21876
              Current no. of conns:                -21749
              Current no. of operations:           0
              Number of routed packets:            67897325
              Persistent pool size:                1
              Persistent measured pool size:       0
              Persistent actual size max:          3
              Persistent pool size limit:          500
              Persistent max time (secs):          601
              Connections taken from pool:         1458162
              Pool availability:                   98.52%
      Server 0x1e0e890 (server2)
              Server:                              server2.domain.com
              Status:                              Relay Master, Slave, Running
              Protocol:                            MySQLBackend
              Port:                                3307
              Server Version:                      5.6.42-84.2-log
              Node Id:                             2717
              Master Id:                           2517
              Last event:                          new_slave
              Triggered at:                        Tue, 28 May 2019 15:15:46 GMT
              Slave Ids:
              Repl Depth:                          3
              Number of connections:               22741
              Current no. of conns:                -21781
              Current no. of operations:           0
              Number of routed packets:            3065679
              Persistent pool size:                52
              Persistent measured pool size:       0
              Persistent actual size max:          4
              Persistent pool size limit:          500
              Persistent max time (secs):          601
              Connections taken from pool:         1463353
              Pool availability:                   98.47%
      MaxScale> list monitors
      ---------------------+---------------------
      Monitor              | Status
      ---------------------+---------------------
      MySQL-Monitor        | Running
      ---------------------+---------------------
      MaxScale> show monitors
      Monitor:                0x1e122a0
      Name:                   MySQL-Monitor
      State:                  Running
      Times monitored:        3104938
      Sampling interval:      1000 milliseconds
      Connect Timeout:        3 seconds
      Read Timeout:           3 seconds
      Write Timeout:          3 seconds
      Connect attempts:       1
      Monitored servers:      [server1.domain.com]:3307, [server2.domain.com]:3307
      Automatic failover:     Enabled
      Failcount:              5
      Failover timeout:       90
      Switchover timeout:     90
      Automatic rejoin:       Disabled
      Enforce read-only:      Disabled
      MaxScale monitor ID:    0
      Detect replication lag: Disabled
      Detect stale master:    Enabled
       
      Server information:
      -------------------
       
      Server:                 server1
      Server ID:              2517
      Read only:              NO
      Slave configured:       YES
      Slave IO running:       YES
      Slave SQL running:      YES
      Master ID:              2717
      Master binlog file:     940
      Master binlog position: 0
       
      Server:                 server2
      Server ID:              2717
      Read only:              NO
      Slave configured:       YES
      Slave IO running:       YES
      Slave SQL running:      YES
      Master ID:              2517
      Master binlog file:     120
      

      MaxScale has the following configuration:

      [maxscale]
      threads = auto
       
      [server1]
      type = server
      address = server1.domain.com
      port = 3307
      protocol = MySQLBackend
      persistpoolmax = 500
      persistmaxtime = 601
       
      [server2]
      type = server
      address = server2.domain.com
      port = 3307
      protocol = MySQLBackend
      persistpoolmax = 500
      persistmaxtime = 601
       
      [Read-Write Service]
      type = service
      router = readwritesplit
      servers = server1,server2
      user = maxscale
      passwd = password
      max_slave_connections = 100%
       
      [Read-Write Listener]
      type = listener
      service = Read-Write Service
      protocol = MySQLClient
      port = 4307
       
      [MySQL Monitor]
      type=monitor
      module=mysqlmon
      auto_failover=true
      servers=server1,server2
      user=maxscale
      passwd=password
      monitor_interval=1000
       
      [MaxAdmin Service]
      type = service
      router = cli
       
      [MaxAdmin Listener]
      type = listener
      service = MaxAdmin Service
      protocol = maxscaled
      socket = default
      

      It looks like the MariaDB Monitor should have the "multimaster" parameter set for a configuration like this:

      https://mariadb.com/kb/en/mariadb-maxscale-22-mariadb-monitor/#multimaster

      But why did the server status only break when the journal file reached its maximum age?

      Attachments

        Activity

          People

            esa.korhonen Esa Korhonen
            GeoffMontee Geoff Montee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.