Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.4.13
Description
One customer of ours had an OOM event on one of its 3 nodes galera cluster.
When then node restarted, entire cluster became non-primary, causing a total outage; it needed a manual shutdown of entire cluster to recover.
I was able to reproduce the situation in my lab: 3 VMs running MariaDB 10.4.13, on one of the nodes I assigned a very small amount of memory and then ran some inserts and selects until the OOM killer fired and killed the mysqld process.
Process was restarted by systemd but when it came back, entire cluster went into non-primary. This is exactly what happened to the customer.
Said customer is also experiencing a memory leak in wsrep library (we are still trying to reproduce that) so that's a bug that triggered another bug it seems.
Needless to say, this is a very critical issue.
I am attaching logs from the 3 nodes, plus OOM event logs from node 3.
Environment: 3 x Centos7 VMs, 2G, 2G, 128M memory (to reproduce the issue; happens with any memory size when OOM fires).
PLEASE NOTE: the hostnames of the nodes mention docker but docker is not used in this test, I just recycled some VMs.
Node 1 log
2020-06-16 9:45:55 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.2.92:4567
|
2020-06-16 9:45:56 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') reconnecting to 34ea9378-9a6c (tcp://192.168.2.92:4567), attempt 0
|
2020-06-16 9:45:59 0 [Note] WSREP: evs::proto(9a21602a-96c2, OPERATIONAL, view_id(REG,2ef04c9f-96d5,153)) suspecting node: 34ea9378-9a6c
|
2020-06-16 9:45:59 0 [Note] WSREP: evs::proto(9a21602a-96c2, OPERATIONAL, view_id(REG,2ef04c9f-96d5,153)) suspected node without join message, declaring inactive
|
2020-06-16 9:46:07 0 [Warning] WSREP: evs::proto(9a21602a-96c2, GATHER, view_id(REG,2ef04c9f-96d5,153)) install timer expired
|
evs::proto(evs::proto(9a21602a-96c2, GATHER, view_id(REG,2ef04c9f-96d5,153)), GATHER) {
|
current_view=view(view_id(REG,2ef04c9f-96d5,153) memb {
|
2ef04c9f-96d5,0
|
34ea9378-9a6c,0
|
9a21602a-96c2,0
|
} joined {
|
} left {
|
} partitioned {
|
}),
|
input_map=evs::input_map: {aru_seq=2091,safe_seq=2091,node_index=node: {idx=0,range=[2098,2097],safe_seq=2091} node: {idx=1,range=[2092,2091],safe_seq=2091} node: {idx=2,range=[2098,2097],safe_seq=2091} },
|
fifo_seq=11372,
|
last_sent=2097,
|
known:
|
2ef04c9f-96d5 at tcp://192.168.2.91:4567
|
{o=1,s=0,i=0,fs=12032,jm=
|
{v=1,t=4,ut=255,o=1,s=2091,sr=-1,as=2091,f=4,src=2ef04c9f-96d5,srcvid=view_id(REG,2ef04c9f-96d5,153),insvid=view_id(UNKNOWN,00000000-0000,0),ru=00000000-0000,r=[-1,-1],fs=12032,nl=(
|
2ef04c9f-96d5, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2098,2097],}
|
34ea9378-9a6c, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2092,2091],}
|
9a21602a-96c2, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2098,2097],}
|
)
|
},
|
}
|
34ea9378-9a6c at tcp://192.168.2.92:4567
|
{o=0,s=1,i=0,fs=4630,}
|
9a21602a-96c2 at
|
{o=1,s=0,i=0,fs=-1,jm=
|
{v=1,t=4,ut=255,o=1,s=2091,sr=-1,as=2091,f=0,src=9a21602a-96c2,srcvid=view_id(REG,2ef04c9f-96d5,153),insvid=view_id(UNKNOWN,00000000-0000,0),ru=00000000-0000,r=[-1,-1],fs=11372,nl=(
|
2ef04c9f-96d5, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2098,2097],}
|
34ea9378-9a6c, {o=0,s=1,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2092,2091],}
|
9a21602a-96c2, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,2ef04c9f-96d5,153),ss=2091,ir=[2098,2097],}
|
)
|
},
|
}
|
}
|
2020-06-16 9:46:07 0 [Note] WSREP: no install message received
|
2020-06-16 9:46:07 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,153) memb {
|
9a21602a-96c2,0
|
} joined {
|
} left {
|
} partitioned {
|
2ef04c9f-96d5,0
|
34ea9378-9a6c,0
|
})
|
2020-06-16 9:46:07 0 [Note] WSREP: view(view_id(NON_PRIM,9a21602a-96c2,154) memb {
|
9a21602a-96c2,0
|
} joined {
|
} left {
|
} partitioned {
|
2ef04c9f-96d5,0
|
34ea9378-9a6c,0
|
})
|
2020-06-16 9:46:07 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2020-06-16 9:46:07 0 [Note] WSREP: Flow-control interval: [16, 16]
|
2020-06-16 9:46:07 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:07 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 130)
|
2020-06-16 9:46:07 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2020-06-16 9:46:07 0 [Note] WSREP: Flow-control interval: [16, 16]
|
2020-06-16 9:46:07 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:07 2 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(1):
|
0: 9a21602a-afa0-11ea-96c2-5f57fad80d47, docker1
|
=================================================
|
2020-06-16 9:46:07 2 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:07 2 [Note] WSREP: Server status change synced -> connected
|
2020-06-16 9:46:07 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:07 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:07 2 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(1):
|
0: 9a21602a-afa0-11ea-96c2-5f57fad80d47, docker1
|
=================================================
|
2020-06-16 9:46:07 2 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:07 2 [Note] WSREP: Server status change connected -> connected
|
2020-06-16 9:46:07 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:07 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:11 0 [Note] WSREP: declaring 2ef04c9f-96d5 at tcp://192.168.2.91:4567 stable
|
2020-06-16 9:46:11 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,155) memb {
|
2ef04c9f-96d5,0
|
9a21602a-96c2,0
|
} joined {
|
} left {
|
} partitioned {
|
34ea9378-9a6c,0
|
})
|
2020-06-16 9:46:11 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
|
2020-06-16 9:46:11 0 [Note] WSREP: Flow-control interval: [23, 23]
|
2020-06-16 9:46:11 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:11 2 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 1
|
members(2):
|
0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, unspecified
|
1: 9a21602a-afa0-11ea-96c2-5f57fad80d47, docker1
|
=================================================
|
2020-06-16 9:46:11 2 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:11 2 [Note] WSREP: Server status change connected -> connected
|
2020-06-16 9:46:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:30 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') reconnecting to 34ea9378-9a6c (tcp://192.168.2.92:4567), attempt 30
|
2020-06-16 9:46:34 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') connection established to 34ea9378-9a6d tcp://192.168.2.92:4567
|
2020-06-16 9:46:34 0 [Note] WSREP: remote endpoint tcp://192.168.2.92:4567 changed identity 34ea9378-afa4-11ea-9a6c-db5f63f7a91e -> 34ea9378-afa4-11ea-9a6d-db5f63f7a91e
|
2020-06-16 9:46:34 0 [Note] WSREP: declaring 2ef04c9f-96d5 at tcp://192.168.2.91:4567 stable
|
2020-06-16 9:46:34 0 [Note] WSREP: declaring 34ea9378-9a6d at tcp://192.168.2.92:4567 stable
|
2020-06-16 9:46:34 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,156) memb {
|
2ef04c9f-96d5,0
|
34ea9378-9a6d,0
|
9a21602a-96c2,0
|
} joined {
|
} left {
|
} partitioned {
|
34ea9378-9a6c,0
|
})
|
2020-06-16 9:46:34 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 2, memb_num = 3
|
2020-06-16 9:46:34 0 [Note] WSREP: Flow-control interval: [28, 28]
|
2020-06-16 9:46:34 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:34 2 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 2
|
members(3):
|
0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, unspecified
|
1: 34ea9378-afa4-11ea-9a6d-db5f63f7a91e, unspecified
|
2: 9a21602a-afa0-11ea-96c2-5f57fad80d47, docker1
|
=================================================
|
2020-06-16 9:46:34 2 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:34 2 [Note] WSREP: Server status change connected -> connected
|
2020-06-16 9:46:34 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:34 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:37 0 [Note] WSREP: (9a21602a-96c2, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
Node 2 log
2020-06-16 9:45:55 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.2.92:4567
|
2020-06-16 9:45:56 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') reconnecting to 34ea9378-9a6c (tcp://192.168.2.92:4567), attempt 0
|
2020-06-16 9:46:09 0 [Note] WSREP: evs::proto(2ef04c9f-96d5, GATHER, view_id(REG,2ef04c9f-96d5,153)) suspecting node: 34ea9378-9a6c
|
2020-06-16 9:46:09 0 [Note] WSREP: evs::proto(2ef04c9f-96d5, GATHER, view_id(REG,2ef04c9f-96d5,153)) suspected node without join message, declaring inactive
|
2020-06-16 9:46:10 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,153) memb {
|
2ef04c9f-96d5,0
|
} joined {
|
} left {
|
} partitioned {
|
34ea9378-9a6c,0
|
9a21602a-96c2,0
|
})
|
2020-06-16 9:46:10 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,154) memb {
|
2ef04c9f-96d5,0
|
} joined {
|
} left {
|
} partitioned {
|
34ea9378-9a6c,0
|
9a21602a-96c2,0
|
})
|
2020-06-16 9:46:10 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2020-06-16 9:46:10 0 [Note] WSREP: Flow-control interval: [256, 256]
|
2020-06-16 9:46:10 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:10 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 130)
|
2020-06-16 9:46:10 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2020-06-16 9:46:10 0 [Note] WSREP: Flow-control interval: [256, 256]
|
2020-06-16 9:46:10 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:10 14 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(1):
|
0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, docker2
|
=================================================
|
2020-06-16 9:46:10 14 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:10 14 [Note] WSREP: Server status change synced -> connected
|
2020-06-16 9:46:10 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:10 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:10 14 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(1):
|
0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, docker2
|
=================================================
|
2020-06-16 9:46:10 14 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:10 14 [Note] WSREP: Server status change connected -> connected
|
2020-06-16 9:46:10 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:10 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:11 0 [Note] WSREP: declaring 9a21602a-96c2 at tcp://192.168.2.90:4567 stable
|
2020-06-16 9:46:11 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,155) memb {
|
2ef04c9f-96d5,0
|
9a21602a-96c2,0
|
} joined {
|
} left {
|
} partitioned {
|
34ea9378-9a6c,0
|
})
|
2020-06-16 9:46:11 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
|
2020-06-16 9:46:11 0 [Note] WSREP: Flow-control interval: [256, 256]
|
2020-06-16 9:46:11 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:11 14 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(2):
|
0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, docker2
|
1: 9a21602a-afa0-11ea-96c2-5f57fad80d47, unspecified
|
=================================================
|
2020-06-16 9:46:11 14 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:11 14 [Note] WSREP: Server status change connected -> connected
|
2020-06-16 9:46:11 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:11 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:31 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') reconnecting to 34ea9378-9a6c (tcp://192.168.2.92:4567), attempt 30
|
2020-06-16 9:46:34 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') connection established to 34ea9378-9a6d tcp://192.168.2.92:4567
|
2020-06-16 9:46:34 0 [Note] WSREP: remote endpoint tcp://192.168.2.92:4567 changed identity 34ea9378-afa4-11ea-9a6c-db5f63f7a91e -> 34ea9378-afa4-11ea-9a6d-db5f63f7a91e
|
2020-06-16 9:46:34 0 [Note] WSREP: declaring 34ea9378-9a6d at tcp://192.168.2.92:4567 stable
|
2020-06-16 9:46:34 0 [Note] WSREP: declaring 9a21602a-96c2 at tcp://192.168.2.90:4567 stable
|
2020-06-16 9:46:34 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,156) memb {
|
2ef04c9f-96d5,0
|
34ea9378-9a6d,0
|
9a21602a-96c2,0
|
} joined {
|
} left {
|
} partitioned {
|
34ea9378-9a6c,0
|
})
|
2020-06-16 9:46:34 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 3
|
2020-06-16 9:46:34 0 [Note] WSREP: Flow-control interval: [256, 256]
|
2020-06-16 9:46:34 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:34 14 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(3):
|
0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, docker2
|
1: 34ea9378-afa4-11ea-9a6d-db5f63f7a91e, unspecified
|
2: 9a21602a-afa0-11ea-96c2-5f57fad80d47, unspecified
|
=================================================
|
2020-06-16 9:46:34 14 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:34 14 [Note] WSREP: Server status change connected -> connected
|
2020-06-16 9:46:34 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:34 14 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:37 0 [Note] WSREP: (2ef04c9f-96d5, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
Node 3 log ( OOM happened here)
wsrep loader: [INFO] wsrep_load(): loading provider library '/usr/lib64/galera-4/libgalera_smm.so'
|
wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(r4599) by Codership Oy <info@codership.com> loaded successfully.
|
2020-06-16 9:46:33 0 [Note] WSREP: CRC-32C: using hardware acceleration.
|
2020-06-16 9:46:33 0 [Note] WSREP: Found saved state: cf61cf68-aef7-11ea-88db-1bc466429584:-1, safe_to_bootstrap: 0
|
2020-06-16 9:46:33 0 [Note] WSREP: GCache DEBUG: opened preamble:
|
Version: 2
|
UUID: cf61cf68-aef7-11ea-88db-1bc466429584
|
Seqno: -1 - -1
|
Offset: -1
|
Synced: 0
|
2020-06-16 9:46:33 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: cf61cf68-aef7-11ea-88db-1bc466429584, offset: -1
|
2020-06-16 9:46:33 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
|
2020-06-16 9:46:33 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
|
2020-06-16 9:46:33 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 16-130
|
2020-06-16 9:46:34 0 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/20643208 bytes) complete.
|
2020-06-16 9:46:34 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (20643208/20643208 bytes) complete.
|
2020-06-16 9:46:34 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 2/117 locked buffers
|
2020-06-16 9:46:34 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 20643208/134217728
|
2020-06-16 9:46:34 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.2.92; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT30S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT15S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 256; gcs.fc_master_slave = yes; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.check...
|
2020-06-16 9:46:34 0 [Note] WSREP: Service thread queue flushed.
|
2020-06-16 9:46:34 0 [Note] WSREP: ####### Assign initial position for certification: cf61cf68-aef7-11ea-88db-1bc466429584:130, protocol version: -1
|
2020-06-16 9:46:34 0 [Note] WSREP: Start replication
|
2020-06-16 9:46:34 0 [Note] WSREP: Connecting with bootstrap option: 0
|
2020-06-16 9:46:34 0 [Note] WSREP: Setting GCS initial position to cf61cf68-aef7-11ea-88db-1bc466429584:130
|
2020-06-16 9:46:34 0 [Note] WSREP: protonet asio version 0
|
2020-06-16 9:46:34 0 [Note] WSREP: Using CRC-32C for message checksums.
|
2020-06-16 9:46:34 0 [Note] WSREP: backend: asio
|
2020-06-16 9:46:34 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
|
2020-06-16 9:46:34 0 [Note] WSREP: restore pc from disk successfully
|
2020-06-16 9:46:34 0 [Note] WSREP: GMCast version 0
|
2020-06-16 9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
|
2020-06-16 9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
|
2020-06-16 9:46:34 0 [Note] WSREP: EVS version 1
|
2020-06-16 9:46:34 0 [Note] WSREP: gcomm: connecting to group 'rick's lab', peer '192.168.2.90:'
|
2020-06-16 9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') connection established to 9a21602a-96c2 tcp://192.168.2.90:4567
|
2020-06-16 9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.2.91:4567
|
2020-06-16 9:46:34 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') connection established to 2ef04c9f-96d5 tcp://192.168.2.91:4567
|
2020-06-16 9:46:34 0 [Note] WSREP: EVS version upgrade 0 -> 1
|
2020-06-16 9:46:34 0 [Note] WSREP: declaring 2ef04c9f-96d5 at tcp://192.168.2.91:4567 stable
|
2020-06-16 9:46:34 0 [Note] WSREP: declaring 9a21602a-96c2 at tcp://192.168.2.90:4567 stable
|
2020-06-16 9:46:34 0 [Note] WSREP: PC protocol upgrade 0 -> 1
|
2020-06-16 9:46:34 0 [Note] WSREP: view(view_id(NON_PRIM,2ef04c9f-96d5,156) memb {
|
2ef04c9f-96d5,0
|
34ea9378-9a6d,0
|
9a21602a-96c2,0
|
} joined {
|
} left {
|
} partitioned {
|
34ea9378-9a6c,0
|
})
|
2020-06-16 9:46:34 0 [Warning] WSREP: node uuid: 2ef04c9f-96d5 last_prim(type: 3, uuid: 2ef04c9f-96d5) is inconsistent to restored view(type: V_NON_PRIM, uuid: 2ef04c9f-96d5
|
2020-06-16 9:46:35 0 [Note] WSREP: gcomm: connected
|
2020-06-16 9:46:35 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
|
2020-06-16 9:46:35 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
|
2020-06-16 9:46:35 0 [Note] WSREP: Opened channel 'rick's lab'
|
2020-06-16 9:46:35 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 3
|
2020-06-16 9:46:35 0 [Note] WSREP: Flow-control interval: [256, 256]
|
2020-06-16 9:46:35 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-06-16 9:46:35 2 [Note] WSREP: Starting applier thread 2
|
2020-06-16 9:46:35 2 [Note] WSREP: ================================================
|
View:
|
id: cf61cf68-aef7-11ea-88db-1bc466429584:-1
|
status: non-primary
|
protocol_version: -1
|
capabilities:
|
final: no
|
own_index: 1
|
members(3):
|
0: 2ef04c9f-afa0-11ea-96d5-223dafaac5ab, unspecified
|
1: 34ea9378-afa4-11ea-9a6d-db5f63f7a91e, docker3
|
2: 9a21602a-afa0-11ea-96c2-5f57fad80d47, unspecified
|
=================================================
|
2020-06-16 9:46:35 2 [Note] WSREP: Non-primary view
|
2020-06-16 9:46:35 2 [Note] WSREP: Server status change disconnected -> connected
|
2020-06-16 9:46:35 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:35 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-06-16 9:46:35 1 [Note] WSREP: Starting rollbacker thread 1
|
2020-06-16 9:46:37 0 [Note] WSREP: (34ea9378-9a6d, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
/var/log/messages on Node 3 (OOM)
Jun 16 09:45:55 docker3 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
|
Jun 16 09:45:55 docker3 kernel: mysqld cpuset=/ mems_allowed=0
|
Jun 16 09:45:55 docker3 kernel: CPU: 0 PID: 1864 Comm: mysqld Tainted: G ------------ T 3.10.0-862.el7.x86_64 #1
|
Jun 16 09:45:55 docker3 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
|
Jun 16 09:45:55 docker3 kernel: Call Trace:
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff9970d768>] dump_stack+0x19/0x1b
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff997090ea>] dump_header+0x90/0x229
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff990f76e2>] ? ktime_get_ts64+0x52/0xf0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff9914e0bf>] ? delayacct_end+0x8f/0xb0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff99197904>] oom_kill_process+0x254/0x3d0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff991973ad>] ? oom_unkillable_task+0xcd/0x120
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff99197456>] ? find_lock_task_mm+0x56/0xc0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff99198146>] out_of_memory+0x4b6/0x4f0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff9919ea64>] __alloc_pages_nodemask+0xaa4/0xbb0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff991e8528>] alloc_pages_current+0x98/0x110
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff99193cf7>] __page_cache_alloc+0x97/0xb0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff991963f8>] filemap_fault+0x298/0x490
|
Jun 16 09:45:55 docker3 kernel: [<ffffffffc020385f>] xfs_filemap_fault+0x5f/0xe0 [xfs]
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff991c05fa>] __do_fault.isra.58+0x8a/0x100
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff991c0b5c>] do_read_fault.isra.60+0x4c/0x1a0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff991c52dc>] handle_pte_fault+0x2dc/0xc30
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff991c747d>] handle_mm_fault+0x39d/0x9b0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff9971a597>] __do_page_fault+0x197/0x4f0
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff9971a925>] do_page_fault+0x35/0x90
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff99716ab6>] ? error_swapgs+0xa7/0xbd
|
Jun 16 09:45:55 docker3 kernel: [<ffffffff99716768>] page_fault+0x28/0x30
|
Jun 16 09:45:55 docker3 kernel: Mem-Info:
|
Jun 16 09:45:55 docker3 kernel: active_anon:7105 inactive_anon:7118 isolated_anon:0#012 active_file:1075 inactive_file:2070 isolated_file:0#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:2609 slab_unreclaimable:2799#012 mapped:206 shmem:1 pagetables:992 bounce:0#012 free:416 free_pcp:3 free_cma:0
|
Jun 16 09:45:55 docker3 kernel: Node 0 DMA free:548kB min:192kB low:240kB high:288kB active_anon:3048kB inactive_anon:3064kB active_file:960kB inactive_file:1760kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:212kB shmem:0kB slab_reclaimable:1128kB slab_unreclaimable:1732kB kernel_stack:1872kB pagetables:512kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:13975 all_unreclaimable? yes
|
Jun 16 09:45:55 docker3 kernel: lowmem_reserve[]: 0 90 90 90
|
Jun 16 09:45:55 docker3 kernel: Node 0 DMA32 free:1116kB min:1124kB low:1404kB high:1684kB active_anon:25372kB inactive_anon:25408kB active_file:3340kB inactive_file:6520kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:114624kB managed:96460kB mlocked:0kB dirty:0kB writeback:0kB mapped:612kB shmem:4kB slab_reclaimable:9308kB slab_unreclaimable:9464kB kernel_stack:2048kB pagetables:3456kB unstable:0kB bounce:0kB free_pcp:12kB local_pcp:12kB free_cma:0kB writeback_tmp:0kB pages_scanned:90373 all_unreclaimable? yes
|
Jun 16 09:45:55 docker3 kernel: lowmem_reserve[]: 0 0 0 0
|
Jun 16 09:45:55 docker3 kernel: Node 0 DMA: 21*4kB (UEM) 24*8kB (UM) 15*16kB (M) 1*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 548kB
|
Jun 16 09:45:55 docker3 kernel: Node 0 DMA32: 225*4kB (UEM) 27*8kB (UEM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1116kB
|
Jun 16 09:45:55 docker3 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
|
Jun 16 09:45:55 docker3 kernel: 9876 total pagecache pages
|
Jun 16 09:45:55 docker3 kernel: 6719 pages in swap cache
|
Jun 16 09:45:55 docker3 kernel: Swap cache stats: add 350160, delete 343441, find 113374/140473
|
Jun 16 09:45:55 docker3 kernel: Free swap = 1695520kB
|
Jun 16 09:45:55 docker3 kernel: Total swap = 2052092kB
|
Jun 16 09:45:55 docker3 kernel: 32654 pages RAM
|
Jun 16 09:45:55 docker3 kernel: 0 pages HighMem/MovableOnly
|
Jun 16 09:45:55 docker3 kernel: 4562 pages reserved
|
Jun 16 09:45:55 docker3 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
|
Jun 16 09:45:55 docker3 kernel: [ 452] 0 452 8745 0 22 79 0 systemd-journal
|
Jun 16 09:45:55 docker3 kernel: [ 468] 0 468 49631 0 29 103 0 lvmetad
|
Jun 16 09:45:55 docker3 kernel: [ 479] 0 479 11084 1 23 144 -1000 systemd-udevd
|
Jun 16 09:45:55 docker3 kernel: [ 580] 0 580 13877 0 27 120 -1000 auditd
|
Jun 16 09:45:55 docker3 kernel: [ 604] 81 604 14549 0 31 162 -900 dbus-daemon
|
Jun 16 09:45:55 docker3 kernel: [ 605] 0 605 119617 13 88 581 0 NetworkManager
|
Jun 16 09:45:55 docker3 kernel: [ 607] 0 607 6594 0 18 84 0 systemd-logind
|
Jun 16 09:45:55 docker3 kernel: [ 610] 999 610 134635 0 61 883 0 polkitd
|
Jun 16 09:45:55 docker3 kernel: [ 618] 0 618 31571 0 20 155 0 crond
|
Jun 16 09:45:55 docker3 kernel: [ 622] 38 622 11809 0 28 177 0 ntpd
|
Jun 16 09:45:55 docker3 kernel: [ 626] 0 626 27522 1 9 32 0 agetty
|
Jun 16 09:45:55 docker3 kernel: [ 873] 0 873 143453 119 97 2647 0 tuned
|
Jun 16 09:45:55 docker3 kernel: [ 874] 0 874 28199 0 59 257 -1000 sshd
|
Jun 16 09:45:55 docker3 kernel: [ 875] 0 875 53602 1 41 384 0 rsyslogd
|
Jun 16 09:45:55 docker3 kernel: [ 1163] 0 1163 40350 29 81 326 0 sshd
|
Jun 16 09:45:55 docker3 kernel: [ 1165] 0 1165 28859 1 14 104 0 bash
|
Jun 16 09:45:55 docker3 kernel: [ 1470] 996 1470 380637 11850 281 76501 0 mysqld
|
Jun 16 09:45:55 docker3 kernel: [ 1889] 0 1889 40409 50 35 306 0 mysql
|
Jun 16 09:45:55 docker3 kernel: Out of memory: Kill process 1470 (mysqld) score 163 or sacrifice child
|
Jun 16 09:45:55 docker3 kernel: Killed process 1470 (mysqld) total-vm:1522548kB, anon-rss:47400kB, file-rss:0kB, shmem-rss:0kB
|
Jun 16 09:45:55 docker3 systemd: mariadb.service: main process exited, code=killed, status=9/KILL
|
Jun 16 09:45:55 docker3 systemd: Unit mariadb.service entered failed state.
|
Jun 16 09:45:55 docker3 systemd: mariadb.service failed.
|
Jun 16 09:46:00 docker3 systemd: mariadb.service holdoff time over, scheduling restart.
|
Jun 16 09:46:00 docker3 systemd: Starting MariaDB 10.4.13 database server...
|
Jun 16 09:46:33 docker3 sh: WSREP: Recovered position cf61cf68-aef7-11ea-88db-1bc466429584:130
|
Jun 16 09:46:33 docker3 mysqld: 2020-06-16 9:46:33 0 [Note] /usr/sbin/mysqld (mysqld 10.4.13-MariaDB-log) starting as process 2248 ...
|
Attachments
Issue Links
- relates to
-
MDEV-22724 Galera cluster losing primary component
- Closed
-
MDEV-22998 memory leak in Galera-4
- Closed