Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35823

Galera cluster down with error in one DB evs_proto.cpp:handle_install_timer():728

Details

    Description

      Hi Support Team,

      We have a 3-node cluster, 2 of which are DB (service IP addr: 172.17.153.82 and 172.21.153.82) and 1 is Galera witness (IP addr: 172.18.16.82). The application connects to DB via an HA proxy located at the same site as the DB, which will hunt between the 2 DB nodes and connect to the healthy one.

      All 3 nodes (2 DBs + 1 witness) are at different sites. There was a network maintenance at the site 172.17.x.x and we expected that the DB on that site would be inaccessible while the other 2 nodes will still form a cluster, and applications can still write to the DB node at 172.21.x.x. However, the application failed to connect to the remaining DB node (Lost connection to server at 'handshake: reading initial communication packet', system error: 11).

      We needed to bootstrap after the network maintenance was over.

      At node 1, the following error was observed:
      exception from gcomm, backend must be restarted: evs::proto(fac95f38-8d6b, GATHER, view_id(REG,15b6ecac-8b9b,58)) failed to form singleton view after exceeding max_install_timeouts 3, giving up (FATAL) at /home/buildbot/buildbot/build/gcomm/src/evs_proto.cpp:handle_install_timer():728

      On the other hand, the log files at the other nodes appeared to be the expected ones. We would like to know:

      1. Why bootstraping was needed to resume?
      2. Is the above error message normal and what is its meaning? A similar ticket MDEV-32110 has been raised by someone else but there is no feedback so far.

      Please advise.

      Thanks and best regards,

      Lawrence

      Attachments

        Activity

          LawrenceMan Lawrence Man created issue -
          serg Sergei Golubchik made changes -
          Field Original Value New Value
          Assignee Jan Lindström [ JIRAUSER53125 ]
          LawrenceMan Lawrence Man made changes -
          Attachment mariadb-error.log-20250111-node1 [ 74465 ]
          Attachment mariadb-error.log-20250111-node2 [ 74466 ]
          LawrenceMan Lawrence Man made changes -
          elenst Elena Stepanova made changes -
          Fix Version/s 10.6 [ 24028 ]
          elenst Elena Stepanova made changes -
          Fix Version/s 10.11 [ 27614 ]
          Fix Version/s 10.6 [ 24028 ]

          People

            janlindstrom Jan Lindström
            LawrenceMan Lawrence Man
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.