Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31617

Galera Cluster could not recover since 2023-07-01 23:55:01 3287386 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)

    XMLWordPrintable

Details

    Description

      Our Galera Cluster was created by 3 nodes.
      Recently, node-0 restart repeatedly since server resource shortage,
      and finally, node-1 and node-2 tried to connect to node-0 but failed:

      2023-07-01 23:54:52 0 [Note] WSREP: (63d23c5c-b67b, 'tcp://0.0.0.0:4567') connection to peer 45ebb9d4-a748 with addr tcp://172.24.151.92:4567 timed out, no messages seen in PT3S, socket stats: rtt: 1473 rttvar: 2527 rto: 204000 lost: 0 last_data_recv: 3344 cwnd: 6 last_queued_since: 500032641 last_delivered_since: 3342827068 send_queue_length: 0 send_queue_bytes: 0 segment: 0 messages: 0

      But after this message, node-1 and node-2 were all showing this message:
      2023-07-01 23:55:01 3213483 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
      2023-07-01 23:55:01 3320009 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
      2023-07-01 23:55:01 3320012 [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)

      This message kept showing, and node-1 and node-2 were both trigger status change, from
      2023-07-01 23:55:01 6 [Note] WSREP: Server status change synced -> connected
      2023-07-01 23:55:01 6 [Note] WSREP: Server status change connected -> connected

      Then turned into Non-primary view:
      2023-07-01 23:55:01 6 [Note] WSREP: Non-primary view

      Since this issue, our Galera Cluster could not access, since each node local_state were turned into Initialization.

      After we compared node-1 and node-2's wsrep_last_committed, we selected node-1 to rebootstrap node (SET WSREP_PROVIDER_OPTIONS = "pc.bootstrap = 1;"), node-1 turned into Primary, and
      [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
      this message did not show, and node-2 joined cluster successfully.

      Did there any reason or trigger, to let this message:
      [Warning] WSREP: gcs_caused() returned -107 (Transport endpoint is not connected)
      keep showing?

      Did we hit any bug?

      Thank you.

      Attachments

        Activity

          People

            janlindstrom Jan Lindström
            mjchangk Min-Jen Chang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.