Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-4779

Maxscale monitor suddenly loses entire cluster status (galeramon)

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 6.4.10
    • 6.4.11
    • galeramon
    • 3 maxscale nodes behind ALB, aws vms

    Description

      A 5-node galera cluster loses two nodes (NODE03 and NODE04) within a couple minutes due to OOM events.
      The cluster reconfigures and remains healthy with the remaining 3 nodes.
      However, maxscale loses status for ALL nodes and causes an outage.

      2023-09-25 14:07:29.827   error  : (mon_report_query_error): Failed to execute query on server 'NODE04' ([10.225.27.118]:3306): Lost connection to server during query
      2023-09-25 14:08:10.003   notice : (log_state_change): Server changed state: NODE04[10.225.27.118:3306]: slave_down. [Slave, Synced, Running] -> [Down]
      2023-09-25 14:09:06.851   error  : (985612) (NODE03); (socket_write): Write to Backend DCB 10.225.27.183 in state DCB::State::POLLING failed: 104, Connection reset by peer
      2023-09-25 14:09:30.579   error  : [galeramon] (post_tick): There are no cluster members
      2023-09-25 14:09:30.579   notice : (log_state_change): Server changed state: NODE01[10.225.27.121:3306]: lost_master. [Master, Synced, Running] -> [Running]
      2023-09-25 14:09:30.579   notice : (log_state_change): Server changed state: NODE02[10.225.27.156:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
      2023-09-25 14:09:30.579   notice : (log_state_change): Server changed state: NODE03[10.225.27.183:3306]: slave_down. [Slave, Synced, Running] -> [Down]
      2023-09-25 14:09:30.579   notice : (log_state_change): Server changed state: NODE05[10.225.27.142:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
      2023-09-25 14:09:30.579   notice : (log_state_change): Server changed state: NODER02[10.225.27.158:3306]: lost_slave. [Slave, Running] -> [Running]
      2023-09-25 14:09:30.579   notice : (log_state_change): Server changed state: NODER03[10.225.27.172:3306]: lost_slave. [Slave, Running] -> [Running]
      2023-09-25 14:09:30.579   notice : (log_state_change): Server changed state: NODER04[10.225.27.116:3306]: lost_slave. [Slave, Running] -> [Running]2023-09-25 2023-09-25 14:09:30.594   error  : (987213) [readwritesplit] (rwsplit-service); (open_connections): Couldn't find suitable Master from 5 candidates.
      

      Attachments

        Activity

          People

            markus makela markus makela
            rpizzi Rick Pizzi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.