Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22908

After OOM event on one node, entire cluster becomes NON-PRIMARY

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.4.13
    • Fix Version/s: 10.4
    • Component/s: Galera
    • Labels:
      None

      Description

      One customer of ours had an OOM event on one of its 3 nodes galera cluster.
      When then node restarted, entire cluster became non-primary, causing a total outage; it needed a manual shutdown of entire cluster to recover.

      I was able to reproduce the situation in my lab: 3 VMs running MariaDB 10.4.13, on one of the nodes I assigned a very small amount of memory and then ran some inserts and selects until the OOM killer fired and killed the mysqld process.

      Process was restarted by systemd but when it came back, entire cluster went into non-primary. This is exactly what happened to the customer.
      Said customer is also experiencing a memory leak in wsrep library (we are still trying to reproduce that) so that's a bug that triggered another bug it seems.

      Needless to say, this is a very critical issue.
      I am attaching logs from the 3 nodes, plus OOM event logs from node 3.

      Environment: 3 x Centos7 VMs, 2G, 2G, 128M memory (to reproduce the issue; happens with any memory size when OOM fires).

      PLEASE NOTE: the hostnames of the nodes mention docker but docker is not used in this test, I just recycled some VMs.

      Node 1 log

      2020-06-16  9:45:55 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.2.92:4567 
      2020-06-16  9:45:56 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') reconnecting to 34ea9378-9a6c (tcp://192.168.2.92:4567), attempt 0
      2020-06-16  9:45:59 0 [Note] WSREP: evs::proto(9a21602a-96c2, OPERATIONAL, view_id(REG,2ef04c9f-96d5,153)) suspecting node: 34ea9378-9a6c
      2020-06-16  9:45:59 0 [Note] WSREP: evs::proto(9a21602a-96c2, OPERATIONAL, view_id(REG,2ef04c9f-96d5,153)) suspected node without join message, declaring inactive
      2020-06-16  9:46:07 0 [Warning] WSREP: evs::proto(9a21602a-96c2, GATHER, view_id(REG,2ef04c9f-96d5,153)) install timer expired
      evs::proto(evs::proto(9a21602a-96c2, GATHER, view_id(REG,2ef04c9f-96d5,153)), GATHER) {
      current_view=view(view_id(REG,2ef04c9f-96d5,153) memb {
      	2ef04c9f-96d5,0
      	34ea9378-9a6c,0
      	9a21602a-96c2,0
      } joined {
      } left {
      } partitioned {
      }),
      input_map=evs::input_map: {aru_seq=2091,safe_seq=2091,node_index=node: {idx=0,range=[2098,2097],safe_seq=2091} node: {idx=1,range=[2092,2091],safe_seq=2091} node: {idx=2,range=[2098,2097],safe_seq=2091} },
      fifo_seq=11372,
      last_sent=2097,
      known:
      2ef04c9f-96d5 at tcp://192.168.2.91:4567
      {o=1,s=0,i=0,fs=12032,jm=
      {v=1,t=4,ut=255,o=1,s=2091,sr=-1,as=2091,f=4,src=2ef04c9f-96d5,srcvid=view_id(REG,2ef04c9f-96d5,153),insvid=view_id(UNKNOWN,00000000-0000,0),ru=00000000-0000,r=[-1,-1],fs=12032,nl=(
      	2ef04c9f-96d5, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2098,2097],}
      	34ea9378-9a6c, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2092,2091],}
      	9a21602a-96c2, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2098,2097],}
      )
      },
      }
      34ea9378-9a6c at tcp://192.168.2.92:4567
      {o=0,s=1,i=0,fs=4630,}
      9a21602a-96c2 at 
      {o=1,s=0,i=0,fs=-1,jm=
      {v=1,t=4,ut=255,o=1,s=2091,sr=-1,as=2091,f=0,src=9a21602a-96c2,srcvid=view_id(REG,2ef04c9f-96d5,153),insvid=view_id(UNKNOWN,00000000-0000,0),ru=00000000-0000,r=[-1,-1],fs=11372,nl=(
      	2ef04c9f-96d5, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2098,2097],}
      	34ea9378-9a6c, {o=0,s=1,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2092,2091],}
      	9a21602a-96c2, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2098,2097],}
      )
      },
      }
       }
      2020-06-16  9:46:07 0 [Note] WSREP: no install message received
      2020-06-16  9:46:07 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,153) memb {
      	9a21602a-96c2,0
      } joined {
      } left {
      } partitioned {
      	2ef04c9f-96d5,0
      	34ea9378-9a6c,0
      })
      2020-06-16  9:46:07 0 [Note] WSREP: view(view_id(NON_PRIM,9a21602a-96c2,154) memb {
      	9a21602a-96c2,0
      } joined {
      } left {
      } partitioned {
      	2ef04c9f-96d5,0
      	34ea9378-9a6c,0
      })
      2020-06-16  9:46:07 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2020-06-16  9:46:07 0 [Note] WSREP: Flow-control interval: [16, 16]
      2020-06-16  9:46:07 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:07 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 130)
      2020-06-16  9:46:07 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2020-06-16  9:46:07 0 [Note] WSREP: Flow-control interval: [16, 16]
      2020-06-16  9:46:07 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:07 2 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(1):
      	0: 9a21602a-afa0-11ea-96c2-5f57fad80d47, docker1
      =================================================
      2020-06-16  9:46:07 2 [Note] WSREP: Non-primary view
      2020-06-16  9:46:07 2 [Note] WSREP: Server status change synced -> connected
      2020-06-16  9:46:07 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:07 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:07 2 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(1):
      	0: 9a21602a-afa0-11ea-96c2-5f57fad80d47, docker1
      =================================================
      2020-06-16  9:46:07 2 [Note] WSREP: Non-primary view
      2020-06-16  9:46:07 2 [Note] WSREP: Server status change connected -> connected
      2020-06-16  9:46:07 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:07 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:11 0 [Note] WSREP: declaring 2ef04c9f-96d5 at tcp://192.168.2.91:4567 stable
      2020-06-16  9:46:11 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,155) memb {
      	2ef04c9f-96d5,0
      	9a21602a-96c2,0
      } joined {
      } left {
      } partitioned {
      	34ea9378-9a6c,0
      })
      2020-06-16  9:46:11 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
      2020-06-16  9:46:11 0 [Note] WSREP: Flow-control interval: [23, 23]
      2020-06-16  9:46:11 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:11 2 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 1
        members(2):
      	0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, unspecified
      	1: 9a21602a-afa0-11ea-96c2-5f57fad80d47, docker1
      =================================================
      2020-06-16  9:46:11 2 [Note] WSREP: Non-primary view
      2020-06-16  9:46:11 2 [Note] WSREP: Server status change connected -> connected
      2020-06-16  9:46:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:30 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') reconnecting to 34ea9378-9a6c (tcp://192.168.2.92:4567), attempt 30
      2020-06-16  9:46:34 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') connection established to 34ea9378-9a6d tcp://192.168.2.92:4567
      2020-06-16  9:46:34 0 [Note] WSREP: remote endpoint tcp://192.168.2.92:4567 changed identity 34ea9378-afa4-11ea-9a6c-db5f63f7a91e -> 34ea9378-afa4-11ea-9a6d-db5f63f7a91e
      2020-06-16  9:46:34 0 [Note] WSREP: declaring 2ef04c9f-96d5 at tcp://192.168.2.91:4567 stable
      2020-06-16  9:46:34 0 [Note] WSREP: declaring 34ea9378-9a6d at tcp://192.168.2.92:4567 stable
      2020-06-16  9:46:34 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,156) memb {
      	2ef04c9f-96d5,0
      	34ea9378-9a6d,0
      	9a21602a-96c2,0
      } joined {
      } left {
      } partitioned {
      	34ea9378-9a6c,0
      })
      2020-06-16  9:46:34 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 2, memb_num = 3
      2020-06-16  9:46:34 0 [Note] WSREP: Flow-control interval: [28, 28]
      2020-06-16  9:46:34 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:34 2 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 2
        members(3):
      	0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, unspecified
      	1: 34ea9378-afa4-11ea-9a6d-db5f63f7a91e, unspecified
      	2: 9a21602a-afa0-11ea-96c2-5f57fad80d47, docker1
      =================================================
      2020-06-16  9:46:34 2 [Note] WSREP: Non-primary view
      2020-06-16  9:46:34 2 [Note] WSREP: Server status change connected -> connected
      2020-06-16  9:46:34 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:34 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:37 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') turning message relay requesting off
      

      Node 2 log

      2020-06-16  9:45:55 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.2.92:4567 
      2020-06-16  9:45:56 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') reconnecting to 34ea9378-9a6c (tcp://192.168.2.92:4567), attempt 0
      2020-06-16  9:46:09 0 [Note] WSREP: evs::proto(2ef04c9f-96d5, GATHER, view_id(REG,2ef04c9f-96d5,153)) suspecting node: 34ea9378-9a6c
      2020-06-16  9:46:09 0 [Note] WSREP: evs::proto(2ef04c9f-96d5, GATHER, view_id(REG,2ef04c9f-96d5,153)) suspected node without join message, declaring inactive
      2020-06-16  9:46:10 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,153) memb {
      	2ef04c9f-96d5,0
      } joined {
      } left {
      } partitioned {
      	34ea9378-9a6c,0
      	9a21602a-96c2,0
      })
      2020-06-16  9:46:10 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,154) memb {
      	2ef04c9f-96d5,0
      } joined {
      } left {
      } partitioned {
      	34ea9378-9a6c,0
      	9a21602a-96c2,0
      })
      2020-06-16  9:46:10 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2020-06-16  9:46:10 0 [Note] WSREP: Flow-control interval: [256, 256]
      2020-06-16  9:46:10 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:10 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 130)
      2020-06-16  9:46:10 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2020-06-16  9:46:10 0 [Note] WSREP: Flow-control interval: [256, 256]
      2020-06-16  9:46:10 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:10 14 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(1):
      	0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, docker2
      =================================================
      2020-06-16  9:46:10 14 [Note] WSREP: Non-primary view
      2020-06-16  9:46:10 14 [Note] WSREP: Server status change synced -> connected
      2020-06-16  9:46:10 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:10 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:10 14 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(1):
      	0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, docker2
      =================================================
      2020-06-16  9:46:10 14 [Note] WSREP: Non-primary view
      2020-06-16  9:46:10 14 [Note] WSREP: Server status change connected -> connected
      2020-06-16  9:46:10 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:10 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:11 0 [Note] WSREP: declaring 9a21602a-96c2 at tcp://192.168.2.90:4567 stable
      2020-06-16  9:46:11 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,155) memb {
      	2ef04c9f-96d5,0
      	9a21602a-96c2,0
      } joined {
      } left {
      } partitioned {
      	34ea9378-9a6c,0
      })
      2020-06-16  9:46:11 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
      2020-06-16  9:46:11 0 [Note] WSREP: Flow-control interval: [256, 256]
      2020-06-16  9:46:11 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:11 14 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(2):
      	0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, docker2
      	1: 9a21602a-afa0-11ea-96c2-5f57fad80d47, unspecified
      =================================================
      2020-06-16  9:46:11 14 [Note] WSREP: Non-primary view
      2020-06-16  9:46:11 14 [Note] WSREP: Server status change connected -> connected
      2020-06-16  9:46:11 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:11 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:31 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') reconnecting to 34ea9378-9a6c (tcp://192.168.2.92:4567), attempt 30
      2020-06-16  9:46:34 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') connection established to 34ea9378-9a6d tcp://192.168.2.92:4567
      2020-06-16  9:46:34 0 [Note] WSREP: remote endpoint tcp://192.168.2.92:4567 changed identity 34ea9378-afa4-11ea-9a6c-db5f63f7a91e -> 34ea9378-afa4-11ea-9a6d-db5f63f7a91e
      2020-06-16  9:46:34 0 [Note] WSREP: declaring 34ea9378-9a6d at tcp://192.168.2.92:4567 stable
      2020-06-16  9:46:34 0 [Note] WSREP: declaring 9a21602a-96c2 at tcp://192.168.2.90:4567 stable
      2020-06-16  9:46:34 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,156) memb {
      	2ef04c9f-96d5,0
      	34ea9378-9a6d,0
      	9a21602a-96c2,0
      } joined {
      } left {
      } partitioned {
      	34ea9378-9a6c,0
      })
      2020-06-16  9:46:34 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 3
      2020-06-16  9:46:34 0 [Note] WSREP: Flow-control interval: [256, 256]
      2020-06-16  9:46:34 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:34 14 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(3):
      	0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, docker2
      	1: 34ea9378-afa4-11ea-9a6d-db5f63f7a91e, unspecified
      	2: 9a21602a-afa0-11ea-96c2-5f57fad80d47, unspecified
      =================================================
      2020-06-16  9:46:34 14 [Note] WSREP: Non-primary view
      2020-06-16  9:46:34 14 [Note] WSREP: Server status change connected -> connected
      2020-06-16  9:46:34 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:34 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:37 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') turning message relay requesting off
      

      Node 3 log ( OOM happened here)

      wsrep loader: [INFO] wsrep_load(): loading provider library '/usr/lib64/galera-4/libgalera_smm.so'
      wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(r4599) by Codership Oy <info@codership.com> loaded successfully.
      2020-06-16  9:46:33 0 [Note] WSREP: CRC-32C: using hardware acceleration.
      2020-06-16  9:46:33 0 [Note] WSREP: Found saved state: cf61cf68-aef7-11ea-88db-1bc466429584:-1, safe_to_bootstrap: 0
      2020-06-16  9:46:33 0 [Note] WSREP: GCache DEBUG: opened preamble:
      Version: 2
      UUID: cf61cf68-aef7-11ea-88db-1bc466429584
      Seqno: -1 - -1
      Offset: -1
      Synced: 0
      2020-06-16  9:46:33 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: cf61cf68-aef7-11ea-88db-1bc466429584, offset: -1
      2020-06-16  9:46:33 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
      2020-06-16  9:46:33 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
      2020-06-16  9:46:33 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 16-130
      2020-06-16  9:46:34 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (       0/20643208 bytes) complete.
      2020-06-16  9:46:34 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (20643208/20643208 bytes) complete.
      2020-06-16  9:46:34 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 2/117 locked buffers
      2020-06-16  9:46:34 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 20643208/134217728
      2020-06-16  9:46:34 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.2.92; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT30S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT15S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 256; gcs.fc_master_slave = yes; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.check...
      2020-06-16  9:46:34 0 [Note] WSREP: Service thread queue flushed.
      2020-06-16  9:46:34 0 [Note] WSREP: ####### Assign initial position for certification: cf61cf68-aef7-11ea-88db-1bc466429584:130, protocol version: -1
      2020-06-16  9:46:34 0 [Note] WSREP: Start replication
      2020-06-16  9:46:34 0 [Note] WSREP: Connecting with bootstrap option: 0
      2020-06-16  9:46:34 0 [Note] WSREP: Setting GCS initial position to cf61cf68-aef7-11ea-88db-1bc466429584:130
      2020-06-16  9:46:34 0 [Note] WSREP: protonet asio version 0
      2020-06-16  9:46:34 0 [Note] WSREP: Using CRC-32C for message checksums.
      2020-06-16  9:46:34 0 [Note] WSREP: backend: asio
      2020-06-16  9:46:34 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
      2020-06-16  9:46:34 0 [Note] WSREP: restore pc from disk successfully
      2020-06-16  9:46:34 0 [Note] WSREP: GMCast version 0
      2020-06-16  9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2020-06-16  9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2020-06-16  9:46:34 0 [Note] WSREP: EVS version 1
      2020-06-16  9:46:34 0 [Note] WSREP: gcomm: connecting to group 'rick's lab', peer '192.168.2.90:'
      2020-06-16  9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') connection established to 9a21602a-96c2 tcp://192.168.2.90:4567
      2020-06-16  9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.2.91:4567 
      2020-06-16  9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') connection established to 2ef04c9f-96d5 tcp://192.168.2.91:4567
      2020-06-16  9:46:34 0 [Note] WSREP: EVS version upgrade 0 -> 1
      2020-06-16  9:46:34 0 [Note] WSREP: declaring 2ef04c9f-96d5 at tcp://192.168.2.91:4567 stable
      2020-06-16  9:46:34 0 [Note] WSREP: declaring 9a21602a-96c2 at tcp://192.168.2.90:4567 stable
      2020-06-16  9:46:34 0 [Note] WSREP: PC protocol upgrade 0 -> 1
      2020-06-16  9:46:34 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,156) memb {
      	2ef04c9f-96d5,0
      	34ea9378-9a6d,0
      	9a21602a-96c2,0
      } joined {
      } left {
      } partitioned {
      	34ea9378-9a6c,0
      })
      2020-06-16  9:46:34 0 [Warning] WSREP: node uuid: 2ef04c9f-96d5 last_prim(type: 3, uuid: 2ef04c9f-96d5) is inconsistent to restored view(type: V_NON_PRIM, uuid: 2ef04c9f-96d5
      2020-06-16  9:46:35 0 [Note] WSREP: gcomm: connected
      2020-06-16  9:46:35 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      2020-06-16  9:46:35 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      2020-06-16  9:46:35 0 [Note] WSREP: Opened channel 'rick's lab'
      2020-06-16  9:46:35 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 3
      2020-06-16  9:46:35 0 [Note] WSREP: Flow-control interval: [256, 256]
      2020-06-16  9:46:35 0 [Note] WSREP: Received NON-PRIMARY.
      2020-06-16  9:46:35 2 [Note] WSREP: Starting applier thread 2
      2020-06-16  9:46:35 2 [Note] WSREP: ================================================
      View:
        id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
        status: non-primary
        protocol_version: -1
        capabilities: 
        final: no
        own_index: 1
        members(3):
      	0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, unspecified
      	1: 34ea9378-afa4-11ea-9a6d-db5f63f7a91e, docker3
      	2: 9a21602a-afa0-11ea-96c2-5f57fad80d47, unspecified
      =================================================
      2020-06-16  9:46:35 2 [Note] WSREP: Non-primary view
      2020-06-16  9:46:35 2 [Note] WSREP: Server status change disconnected -> connected
      2020-06-16  9:46:35 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:35 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-06-16  9:46:35 1 [Note] WSREP: Starting rollbacker thread 1
      2020-06-16  9:46:37 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') turning message relay requesting off
      

      /var/log/messages on Node 3 (OOM)

      Jun 16 09:45:55 docker3 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
      Jun 16 09:45:55 docker3 kernel: mysqld cpuset=/ mems_allowed=0
      Jun 16 09:45:55 docker3 kernel: CPU: 0 PID: 1864 Comm: mysqld Tainted: G               ------------ T 3.10.0-862.el7.x86_64 #1
      Jun 16 09:45:55 docker3 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      Jun 16 09:45:55 docker3 kernel: Call Trace:
      Jun 16 09:45:55 docker3 kernel: [<ffffffff9970d768>] dump_stack+0x19/0x1b
      Jun 16 09:45:55 docker3 kernel: [<ffffffff997090ea>] dump_header+0x90/0x229
      Jun 16 09:45:55 docker3 kernel: [<ffffffff990f76e2>] ? ktime_get_ts64+0x52/0xf0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff9914e0bf>] ? delayacct_end+0x8f/0xb0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff99197904>] oom_kill_process+0x254/0x3d0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff991973ad>] ? oom_unkillable_task+0xcd/0x120
      Jun 16 09:45:55 docker3 kernel: [<ffffffff99197456>] ? find_lock_task_mm+0x56/0xc0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff99198146>] out_of_memory+0x4b6/0x4f0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff9919ea64>] __alloc_pages_nodemask+0xaa4/0xbb0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff991e8528>] alloc_pages_current+0x98/0x110
      Jun 16 09:45:55 docker3 kernel: [<ffffffff99193cf7>] __page_cache_alloc+0x97/0xb0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff991963f8>] filemap_fault+0x298/0x490
      Jun 16 09:45:55 docker3 kernel: [<ffffffffc020385f>] xfs_filemap_fault+0x5f/0xe0 [xfs]
      Jun 16 09:45:55 docker3 kernel: [<ffffffff991c05fa>] __do_fault.isra.58+0x8a/0x100
      Jun 16 09:45:55 docker3 kernel: [<ffffffff991c0b5c>] do_read_fault.isra.60+0x4c/0x1a0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff991c52dc>] handle_pte_fault+0x2dc/0xc30
      Jun 16 09:45:55 docker3 kernel: [<ffffffff991c747d>] handle_mm_fault+0x39d/0x9b0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff9971a597>] __do_page_fault+0x197/0x4f0
      Jun 16 09:45:55 docker3 kernel: [<ffffffff9971a925>] do_page_fault+0x35/0x90
      Jun 16 09:45:55 docker3 kernel: [<ffffffff99716ab6>] ? error_swapgs+0xa7/0xbd
      Jun 16 09:45:55 docker3 kernel: [<ffffffff99716768>] page_fault+0x28/0x30
      Jun 16 09:45:55 docker3 kernel: Mem-Info:
      Jun 16 09:45:55 docker3 kernel: active_anon:7105 inactive_anon:7118 isolated_anon:0#012 active_file:1075 inactive_file:2070 isolated_file:0#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:2609 slab_unreclaimable:2799#012 mapped:206 shmem:1 pagetables:992 bounce:0#012 free:416 free_pcp:3 free_cma:0
      Jun 16 09:45:55 docker3 kernel: Node 0 DMA free:548kB min:192kB low:240kB high:288kB active_anon:3048kB inactive_anon:3064kB active_file:960kB inactive_file:1760kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:212kB shmem:0kB slab_reclaimable:1128kB slab_unreclaimable:1732kB kernel_stack:1872kB pagetables:512kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:13975 all_unreclaimable? yes
      Jun 16 09:45:55 docker3 kernel: lowmem_reserve[]: 0 90 90 90
      Jun 16 09:45:55 docker3 kernel: Node 0 DMA32 free:1116kB min:1124kB low:1404kB high:1684kB active_anon:25372kB inactive_anon:25408kB active_file:3340kB inactive_file:6520kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:114624kB managed:96460kB mlocked:0kB dirty:0kB writeback:0kB mapped:612kB shmem:4kB slab_reclaimable:9308kB slab_unreclaimable:9464kB kernel_stack:2048kB pagetables:3456kB unstable:0kB bounce:0kB free_pcp:12kB local_pcp:12kB free_cma:0kB writeback_tmp:0kB pages_scanned:90373 all_unreclaimable? yes
      Jun 16 09:45:55 docker3 kernel: lowmem_reserve[]: 0 0 0 0
      Jun 16 09:45:55 docker3 kernel: Node 0 DMA: 21*4kB (UEM) 24*8kB (UM) 15*16kB (M) 1*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 548kB
      Jun 16 09:45:55 docker3 kernel: Node 0 DMA32: 225*4kB (UEM) 27*8kB (UEM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1116kB
      Jun 16 09:45:55 docker3 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      Jun 16 09:45:55 docker3 kernel: 9876 total pagecache pages
      Jun 16 09:45:55 docker3 kernel: 6719 pages in swap cache
      Jun 16 09:45:55 docker3 kernel: Swap cache stats: add 350160, delete 343441, find 113374/140473
      Jun 16 09:45:55 docker3 kernel: Free swap  = 1695520kB
      Jun 16 09:45:55 docker3 kernel: Total swap = 2052092kB
      Jun 16 09:45:55 docker3 kernel: 32654 pages RAM
      Jun 16 09:45:55 docker3 kernel: 0 pages HighMem/MovableOnly
      Jun 16 09:45:55 docker3 kernel: 4562 pages reserved
      Jun 16 09:45:55 docker3 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
      Jun 16 09:45:55 docker3 kernel: [  452]     0   452     8745        0      22       79             0 systemd-journal
      Jun 16 09:45:55 docker3 kernel: [  468]     0   468    49631        0      29      103             0 lvmetad
      Jun 16 09:45:55 docker3 kernel: [  479]     0   479    11084        1      23      144         -1000 systemd-udevd
      Jun 16 09:45:55 docker3 kernel: [  580]     0   580    13877        0      27      120         -1000 auditd
      Jun 16 09:45:55 docker3 kernel: [  604]    81   604    14549        0      31      162          -900 dbus-daemon
      Jun 16 09:45:55 docker3 kernel: [  605]     0   605   119617       13      88      581             0 NetworkManager
      Jun 16 09:45:55 docker3 kernel: [  607]     0   607     6594        0      18       84             0 systemd-logind
      Jun 16 09:45:55 docker3 kernel: [  610]   999   610   134635        0      61      883             0 polkitd
      Jun 16 09:45:55 docker3 kernel: [  618]     0   618    31571        0      20      155             0 crond
      Jun 16 09:45:55 docker3 kernel: [  622]    38   622    11809        0      28      177             0 ntpd
      Jun 16 09:45:55 docker3 kernel: [  626]     0   626    27522        1       9       32             0 agetty
      Jun 16 09:45:55 docker3 kernel: [  873]     0   873   143453      119      97     2647             0 tuned
      Jun 16 09:45:55 docker3 kernel: [  874]     0   874    28199        0      59      257         -1000 sshd
      Jun 16 09:45:55 docker3 kernel: [  875]     0   875    53602        1      41      384             0 rsyslogd
      Jun 16 09:45:55 docker3 kernel: [ 1163]     0  1163    40350       29      81      326             0 sshd
      Jun 16 09:45:55 docker3 kernel: [ 1165]     0  1165    28859        1      14      104             0 bash
      Jun 16 09:45:55 docker3 kernel: [ 1470]   996  1470   380637    11850     281    76501             0 mysqld
      Jun 16 09:45:55 docker3 kernel: [ 1889]     0  1889    40409       50      35      306             0 mysql
      Jun 16 09:45:55 docker3 kernel: Out of memory: Kill process 1470 (mysqld) score 163 or sacrifice child
      Jun 16 09:45:55 docker3 kernel: Killed process 1470 (mysqld) total-vm:1522548kB, anon-rss:47400kB, file-rss:0kB, shmem-rss:0kB
      Jun 16 09:45:55 docker3 systemd: mariadb.service: main process exited, code=killed, status=9/KILL
      Jun 16 09:45:55 docker3 systemd: Unit mariadb.service entered failed state.
      Jun 16 09:45:55 docker3 systemd: mariadb.service failed.
      Jun 16 09:46:00 docker3 systemd: mariadb.service holdoff time over, scheduling restart.
      Jun 16 09:46:00 docker3 systemd: Starting MariaDB 10.4.13 database server...
      Jun 16 09:46:33 docker3 sh: WSREP: Recovered position cf61cf68-aef7-11ea-88db-1bc466429584:130
      Jun 16 09:46:33 docker3 mysqld: 2020-06-16  9:46:33 0 [Note] /usr/sbin/mysqld (mysqld 10.4.13-MariaDB-log) starting as process 2248 ...
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              sysprg Julius Goryavsky
              Reporter:
              rpizzi Rick Pizzi
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated: