Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
Description
Unsure why this keeps happening. The cluster keeps shifting all nodes from PRIMARY to NON-PRIMARY at the exact same time. The logs are vaguely implying that updates are only being pushed to half the cluster, and not the other half, so it is splitting itself.
Currently the cluster is 6 nodes spread across 3 data centers with 2 nodes per data center.
When this occurs, the log message gets massively flooded. Overnight, each server generated anywhere in the neighborhood of ~50-100MB in the error log file, flooded with the same text over and over again showing the unstable cluster state. This makes it difficult to find where/how things are breaking.
2020-05-26 10:18:47 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.16.8.12:4567
|
2020-05-26 10:18:47 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') connection established to f57bde06 tcp://172.16.8.12:4567
|
2020-05-26 10:18:47 0 [Note] WSREP: declaring 3cfcb3bd at tcp://172.16.12.61:4567 stable
|
2020-05-26 10:18:47 0 [Note] WSREP: declaring 454edd57 at tcp://172.16.20.61:4567 stable
|
2020-05-26 10:18:47 0 [Note] WSREP: declaring 54695c9f at tcp://172.16.12.62:4567 stable
|
2020-05-26 10:18:47 0 [Note] WSREP: declaring 74c795ad at tcp://172.16.8.11:4567 stable
|
2020-05-26 10:18:47 0 [Note] WSREP: declaring f57bde06 at tcp://172.16.8.12:4567 stable
|
2020-05-26 10:18:47 0 [Note] WSREP: Node 14cf7875 state prim
|
2020-05-26 10:18:47 0 [Note] WSREP: view(view_id(PRIM,14cf7875,7) memb {
|
14cf7875,2
|
3cfcb3bd,1
|
454edd57,2
|
54695c9f,1
|
74c795ad,8
|
f57bde06,8
|
} joined {
|
} left {
|
} partitioned {
|
})
|
2020-05-26 10:18:47 0 [Note] WSREP: save pc into disk
|
2020-05-26 10:18:47 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 6
|
2020-05-26 10:18:47 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: f5acf28b-9f74-11ea-a86a-d6a84667ccd7
|
2020-05-26 10:18:47 0 [Note] WSREP: STATE EXCHANGE: sent state msg: f5acf28b-9f74-11ea-a86a-d6a84667ccd7
|
2020-05-26 10:18:47 0 [Note] WSREP: STATE EXCHANGE: got state msg: f5acf28b-9f74-11ea-a86a-d6a84667ccd7 from 4 (puy-maria-01)
|
2020-05-26 10:18:47 0 [Note] WSREP: STATE EXCHANGE: got state msg: f5acf28b-9f74-11ea-a86a-d6a84667ccd7 from 0 (sea-maria-02)
|
2020-05-26 10:18:47 0 [Note] WSREP: STATE EXCHANGE: got state msg: f5acf28b-9f74-11ea-a86a-d6a84667ccd7 from 2 (sea-maria-01)
|
2020-05-26 10:18:47 0 [Note] WSREP: STATE EXCHANGE: got state msg: f5acf28b-9f74-11ea-a86a-d6a84667ccd7 from 3 (tac-maria-02)
|
2020-05-26 10:18:47 0 [Note] WSREP: STATE EXCHANGE: got state msg: f5acf28b-9f74-11ea-a86a-d6a84667ccd7 from 1 (tac-cospix-maria-01)
|
2020-05-26 10:18:47 0 [Note] WSREP: STATE EXCHANGE: got state msg: f5acf28b-9f74-11ea-a86a-d6a84667ccd7 from 5 (puy-maria-02)
|
2020-05-26 10:18:47 0 [Note] WSREP: Quorum results:
|
version = 5,
|
component = PRIMARY,
|
conf_id = 5,
|
members = 5/6 (joined/total),
|
act_id = 159998,
|
last_appl. = 159942,
|
protocols = 1/10/4 (gcs/repl/appl),
|
vote policy= 0,
|
group UUID = 00c6c8ed-9281-11ea-b296-5b01975a9d80
|
2020-05-26 10:18:47 0 [Note] WSREP: Writing down CC checksum: c3bcc75b e521c376 e088241f 2abed3ee at offset 456
|
2020-05-26 10:18:47 0 [Note] WSREP: Flow-control interval: [39, 39]
|
2020-05-26 10:18:47 0 [Note] WSREP: Trying to continue unpaused monitor
|
2020-05-26 10:18:47 2 [Note] WSREP: ####### processing CC 159999, local, ordered
|
2020-05-26 10:18:47 2 [Note] WSREP: ####### drain monitors upto 159998
|
2020-05-26 10:18:47 2 [Note] WSREP: REPL Protocols: 10 (5, 3)
|
2020-05-26 10:18:47 2 [Note] WSREP: ####### My UUID: 14cf7875-9f6d-11ea-be91-3b28d841cd17
|
2020-05-26 10:18:47 2 [Note] WSREP: ####### ST not required
|
2020-05-26 10:18:47 2 [Note] WSREP: Skipping cert index reset
|
2020-05-26 10:18:47 2 [Note] WSREP: ####### Adjusting cert position: 159998 -> 159999
|
2020-05-26 10:18:47 0 [Note] WSREP: Service thread queue flushed.
|
2020-05-26 10:18:47 2 [Note] WSREP: ####### Setting monitor position to 159999
|
2020-05-26 10:18:47 2 [Note] WSREP: Lowest cert indnex boundary for CC from group: 159943
|
2020-05-26 10:18:47 2 [Note] WSREP: Min available from gcache for CC from group: 156743
|
2020-05-26 10:18:47 2 [Note] WSREP: ================================================
|
View:
|
id: 00c6c8ed-9281-11ea-b296-5b01975a9d80:159999
|
status: primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(6):
|
0: 14cf7875-9f6d-11ea-be91-3b28d841cd17, sea-maria-02
|
1: 3cfcb3bd-9f66-11ea-b50e-4306f593b013, tac-cospix-maria-01
|
2: 454edd57-9f6b-11ea-aeea-2bf542f463e4, sea-maria-01
|
3: 54695c9f-9f66-11ea-a4d8-7ba7a1c6775f, tac-maria-02
|
4: 74c795ad-9f6e-11ea-989d-9313028d7e05, puy-maria-01
|
5: f57bde06-9f74-11ea-968d-af0c6e831d4e, puy-maria-02
|
=================================================
|
2020-05-26 10:18:47 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-05-26 10:18:47 0 [Note] WSREP: Member 5.8 (puy-maria-02) requested state transfer from '*any*'. Selected 1.1 (tac-cospix-maria-01)(SYNCED) as donor.
|
2020-05-26 10:18:48 0 [Note] WSREP: 1.1 (tac-cospix-maria-01): State transfer to 5.8 (puy-maria-02) complete.
|
2020-05-26 10:18:48 0 [Note] WSREP: Member 1.1 (tac-cospix-maria-01) synced with group.
|
2020-05-26 10:18:50 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
2020-05-26 10:33:52 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') connection to peer 74c795ad with addr tcp://172.16.8.11:4567 timed out, no messages seen in PT3S
|
2020-05-26 10:33:52 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.16.8.11:4567
|
2020-05-26 10:33:53 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') reconnecting to 74c795ad (tcp://172.16.8.11:4567), attempt 0
|
2020-05-26 10:33:56 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') connection to peer 74c795ad with addr tcp://172.16.8.11:4567 timed out, no messages seen in PT3S
|
2020-05-26 10:33:56 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') connection established to 74c795ad tcp://172.16.8.11:4567
|
2020-05-26 10:33:59 0 [Note] WSREP: (14cf7875, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
2020-05-26 10:51:00 0 [Note] WSREP: view(view_id(NON_PRIM,14cf7875,7) memb {
|
14cf7875,2
|
3cfcb3bd,1
|
54695c9f,1
|
} joined {
|
} left {
|
} partitioned {
|
454edd57,2
|
74c795ad,8
|
f57bde06,8
|
})
|
2020-05-26 10:51:00 0 [Note] WSREP: declaring 3cfcb3bd at tcp://172.16.12.61:4567 stable
|
2020-05-26 10:51:00 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 3
|
2020-05-26 10:51:00 0 [Note] WSREP: declaring 54695c9f at tcp://172.16.12.62:4567 stable
|
2020-05-26 10:51:00 0 [Note] WSREP: Writing down CC checksum: d8cb29c2 cdf070a4 64dba1ce 6fc2471e at offset 264
|
2020-05-26 10:51:00 0 [Note] WSREP: Flow-control interval: [28, 28]
|
2020-05-26 10:51:00 0 [Note] WSREP: Trying to continue unpaused monitor
|
2020-05-26 10:51:00 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-05-26 10:51:00 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 161343)
|
2020-05-26 10:51:00 2 [Note] WSREP: ####### processing CC -1, local, ordered
|
2020-05-26 10:51:00 2 [Note] WSREP: ####### drain monitors upto 161343
|
2020-05-26 10:51:00 2 [Note] WSREP: ####### My UUID: 14cf7875-9f6d-11ea-be91-3b28d841cd17
|
2020-05-26 10:51:00 2 [Note] WSREP: ####### ST not required
|
2020-05-26 10:51:00 2 [Note] WSREP: ================================================
|
View:
|
id: 00c6c8ed-9281-11ea-b296-5b01975a9d80:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(3):
|
0: 14cf7875-9f6d-11ea-be91-3b28d841cd17, sea-maria-02
|
1: 3cfcb3bd-9f66-11ea-b50e-4306f593b013, tac-cospix-maria-01
|
2: 54695c9f-9f66-11ea-a4d8-7ba7a1c6775f, tac-maria-02
|
=================================================
|
2020-05-26 10:51:00 2 [Note] WSREP: Non-primary view
|
2020-05-26 10:51:00 2 [Note] WSREP: Server status change synced -> connected
|
2020-05-26 10:51:00 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-05-26 10:51:00 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-05-26 10:51:00 0 [Note] WSREP: view(view_id(NON_PRIM,14cf7875,8) memb {
|
14cf7875,2
|
3cfcb3bd,1
|
54695c9f,1
|
} joined {
|
} left {
|
} partitioned {
|
454edd57,2
|
74c795ad,8
|
f57bde06,8
|
})
|
2020-05-26 10:51:00 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 3
|
2020-05-26 10:51:00 0 [Note] WSREP: Writing down CC checksum: d8cb29c2 cdf070a4 64dba1ce 6fc2471e at offset 264
|
2020-05-26 10:51:00 0 [Note] WSREP: Flow-control interval: [28, 28]
|
2020-05-26 10:51:00 0 [Note] WSREP: Trying to continue unpaused monitor
|
2020-05-26 10:51:00 0 [Note] WSREP: Received NON-PRIMARY.
|
2020-05-26 10:51:00 2 [Note] WSREP: ####### processing CC -1, local, ordered
|
2020-05-26 10:51:00 2 [Note] WSREP: ####### drain monitors upto 161343
|
2020-05-26 10:51:00 2 [Note] WSREP: ####### My UUID: 14cf7875-9f6d-11ea-be91-3b28d841cd17
|
2020-05-26 10:51:00 2 [Note] WSREP: ####### ST not required
|
2020-05-26 10:51:00 2 [Note] WSREP: ================================================
|
View:
|
id: 00c6c8ed-9281-11ea-b296-5b01975a9d80:-1
|
status: non-primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(3):
|
0: 14cf7875-9f6d-11ea-be91-3b28d841cd17, sea-maria-02
|
1: 3cfcb3bd-9f66-11ea-b50e-4306f593b013, tac-cospix-maria-01
|
2: 54695c9f-9f66-11ea-a4d8-7ba7a1c6775f, tac-maria-02
|
=================================================
|
2020-05-26 10:51:00 2 [Note] WSREP: Non-primary view
|
2020-05-26 10:51:00 2 [Note] WSREP: Server status change connected -> connected
|
2020-05-26 10:51:00 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2020-05-26 10:51:00 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
Attachments
Issue Links
- relates to
-
MDEV-22908 After OOM event on one node, entire cluster becomes NON-PRIMARY
- Closed