Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35823

Galera cluster down with error in one DB evs_proto.cpp:handle_install_timer():728

Details

    Description

      Hi Support Team,

      We have a 3-node cluster, 2 of which are DB (service IP addr: 172.17.153.82 and 172.21.153.82) and 1 is Galera witness (IP addr: 172.18.16.82). The application connects to DB via an HA proxy located at the same site as the DB, which will hunt between the 2 DB nodes and connect to the healthy one.

      All 3 nodes (2 DBs + 1 witness) are at different sites. There was a network maintenance at the site 172.17.x.x and we expected that the DB on that site would be inaccessible while the other 2 nodes will still form a cluster, and applications can still write to the DB node at 172.21.x.x. However, the application failed to connect to the remaining DB node (Lost connection to server at 'handshake: reading initial communication packet', system error: 11).

      We needed to bootstrap after the network maintenance was over.

      At node 1, the following error was observed:
      exception from gcomm, backend must be restarted: evs::proto(fac95f38-8d6b, GATHER, view_id(REG,15b6ecac-8b9b,58)) failed to form singleton view after exceeding max_install_timeouts 3, giving up (FATAL) at /home/buildbot/buildbot/build/gcomm/src/evs_proto.cpp:handle_install_timer():728

      On the other hand, the log files at the other nodes appeared to be the expected ones. We would like to know:

      1. Why bootstraping was needed to resume?
      2. Is the above error message normal and what is its meaning? A similar ticket MDEV-32110 has been raised by someone else but there is no feedback so far.

      Please advise.

      Thanks and best regards,

      Lawrence

      Attachments

        Activity

          LawrenceMan Lawrence Man added a comment -

          The mariadb-error.log for 2 nodes and the Galera log file are uploaded for your ref.

          LawrenceMan Lawrence Man added a comment - The mariadb-error.log for 2 nodes and the Galera log file are uploaded for your ref.

          People

            janlindstrom Jan Lindström
            LawrenceMan Lawrence Man
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.