Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17935

Loss of connection every 180 seconds under load

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.2.17
    • 10.2.19
    • Galera

    Description

      Client community choice financial is running into a problem where during times of high traffic, one or two of the nodes will drop out and then after rejoining they error every 180 seconds exactly with a loss of connection to other members of the cluster. Here is an example of the error snippet:

      2018-12-08 1:36:04 140465462552320 [Note] WSREP: (2619e469, 'ssl://0.0.0.0:4567') connection to peer 4ec5a287 with addr ssl://10.225.17.115:4567 timed out, no messages seen in PT3S
      2018-12-08 1:36:04 140465462552320 [Note] WSREP: (2619e469, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://10.225.17.115:4567
      2018-12-08 1:36:05 140465462552320 [Note] WSREP: (2619e469, 'ssl://0.0.0.0:4567') reconnecting to 4ec5a287 (ssl://10.225.17.115:4567), attempt 0
      2018-12-08 1:36:06 140465462552320 [Note] WSREP: evs::proto(2619e469, GATHER, view_id(REG,0fb31d1c,854)) suspecting node: 4ec5a287
      2018-12-08 1:36:06 140465462552320 [Note] WSREP: evs::proto(2619e469, GATHER, view_id(REG,0fb31d1c,854)) suspected node without join message, declaring inactive
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: declaring 0fb31d1c at ssl://10.225.16.156:4567 stable
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: declaring 1804a1ab at ssl://10.225.18.13:4567 stable
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: declaring c6aa9036 at ssl://10.225.17.83:4567 stable
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: Node 0fb31d1c state prim
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: view(view_id(PRIM,0fb31d1c,855) memb

      { 0fb31d1c,0 1804a1ab,0 2619e469,0 c6aa9036,0 }

      joined {
      } left {
      } partitioned

      { 4ec5a287,0 }

      )
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: save pc into disk
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: forgetting 4ec5a287 (ssl://10.225.17.115:4567)
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: deleting entry ssl://10.225.17.115:4567
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 4
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
      2018-12-08 1:36:07 140465462552320 [Note] WSREP: (2619e469, 'ssl://0.0.0.0:4567') turning message relay requesting off
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: STATE EXCHANGE: sent state msg: a23b93b6-fa89-11e8-afc7-fedac3aae8d8
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: STATE EXCHANGE: got state msg: a23b93b6-fa89-11e8-afc7-fedac3aae8d8 from 0 (ecash-db-d)
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: STATE EXCHANGE: got state msg: a23b93b6-fa89-11e8-afc7-fedac3aae8d8 from 1 (ecash-db-c)
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: STATE EXCHANGE: got state msg: a23b93b6-fa89-11e8-afc7-fedac3aae8d8 from 2 (ecash-db-a)
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: STATE EXCHANGE: got state msg: a23b93b6-fa89-11e8-afc7-fedac3aae8d8 from 3 (ecash-db-e)
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: Quorum results:
      version = 4,
      component = PRIMARY,
      conf_id = 701,
      members = 4/4 (joined/total),
      act_id = 311915421,
      last_appl. = 311915330,
      protocols = 0/8/3 (gcs/repl/appl),
      group UUID = 5388b583-0c4f-11e8-8644-9f4517a87e4a
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: Flow-control interval: [16, 16]
      2018-12-08 1:36:07 140465454159616 [Note] WSREP: Trying to continue unpaused monitor
      2018-12-08 1:36:07 140465431856896 [Note] WSREP: New cluster view: global state: 5388b583-0c4f-11e8-8644-9f4517a87e4a:311915421, view# 702: Primary, number of nodes: 4, my index: 2, protocol version 3
      2018-12-08 1:36:07 140465431856896 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2018-12-08 1:36:07 140465431856896 [Note] WSREP: REPL Protocols: 8 (3, 2)
      2018-12-08 1:36:07 140465431856896 [Note] WSREP: Assign initial position for certification: 311915421, protocol version: 3
      2018-12-08 1:36:07 140465496123136 [Note] WSREP: Service thread queue flushed.
      2018-12-08 1:36:09 140465462552320 [Note] WSREP: SSL handshake successful, remote endpoint ssl://10.225.17.115:56884 local endpoint ssl://10.225.16.150:4567 cipher: AES128-SHA compression: none
      2018-12-08 1:36:09 140465462552320 [Note] WSREP: (2619e469, 'ssl://0.0.0.0:4567') connection established to 4ec5a287 ssl://10.225.17.115:4567
      2018-12-08 1:36:09 140465462552320 [Warning] WSREP: discarding established (time wait) 4ec5a287 (ssl://10.225.17.115:4567)
      2018-12-08 1:36:10 140465462552320 [Note] WSREP: cleaning up 4ec5a287 (ssl://10.225.17.115:4567)
      2018-12-08 1:36:12 140465462552320 [Note] WSREP: (2619e469, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://10.225.17.115:4567
      2018-12-08 1:36:12 140465462552320 [Note] WSREP: SSL handshake successful, remote endpoint ssl://10.225.17.115:4567 local endpoint ssl://10.225.16.150:43646 cipher: AES128-SHA compression: none

      This will happen repeatedly until an indeterminate amount of time has passed and the host becomes stable again.
      Client has had AWS look at underlying hardware and systems and they have not found any limits that are hit or any network issues.

      Attachments

        Activity

          People

            jplindst Jan Lindström (Inactive)
            ivenn Isaac Venn
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.