[MDEV-33255] [ERROR] WSREP: exception from gcomm, backend must be restarted: aborting due to conflicting prims: older overrides (FATAL) Created: 2024-01-16  Updated: 2024-02-08

Status: Needs Feedback
Project: MariaDB Server
Component/s: None
Affects Version/s: 11.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Runzi Assignee: Jan Lindström
Resolution: Unresolved Votes: 0
Labels: None


 Description   

// Some comments here
public String getFoo()
{
2024-01-03  0:10:32 0 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
pc::Proto{uuid=acbe31e8-9673,start_prim=0,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=466,checksum=0,instances=
  3c167a24-b8f0,prim=1,un=0,last_seq=103,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  4ebe7f59-8874,prim=1,un=0,last_seq=372,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  acbe31e8-9673,prim=1,un=0,last_seq=466,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
,state_msgs=
  3c167a24-b8f0,pcmsg{ type=STATE, seq=0, flags= 0, node_map {  3c167a24-b8f0,prim=1,un=0,last_seq=103,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  4ebe7f59-8874,prim=1,un=0,last_seq=372,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  acbe31e8-9673,prim=1,un=0,last_seq=466,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
}}
  4ebe7f59-8874,pcmsg{ type=STATE, seq=0, flags= 0, node_map {  3c167a24-b8f0,prim=1,un=0,last_seq=103,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  4ebe7f59-8874,prim=1,un=0,last_seq=372,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  acbe31e8-9673,prim=1,un=0,last_seq=466,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
}}
,current_view=view(view_id(REG,3c167a24-b8f0,75) memb {
  3c167a24-b8f0,0
  4ebe7f59-8874,0
  774206ed-985d,0
  91f27cc6-a5ee,0
  acbe31e8-9673,0
  be92f86c-94d1,0
} joined {
  774206ed-985d,0
  91f27cc6-a5ee,0
  be92f86c-94d1,0
} left {
} partitioned {
}),pc_view=view(view_id(PRIM,3c167a24-b8f0,64) memb {
  3c167a24-b8f0,0
  4ebe7f59-8874,0
  acbe31e8-9673,0
} joined {
} left {
} partitioned {
}),mtu=32636}
2024-01-03  0:10:32 0 [Note] WSREP: {v=1,t=1,ut=255,o=4,s=0,sr=0,as=-1,f=4,src=774206ed-985d,srcvid=view_id(REG,3c167a24-b8f0,75),insvid=view_id(UNKNOWN,00000000-0000,0),ru=00000000-0000,r=[-1,-1],fs=5786559,nl=(
)
} 168
2024-01-03  0:10:32 0 [ERROR] WSREP: exception caused by message: {v=1,t=3,ut=255,o=1,s=4,sr=-1,as=3,f=4,src=774206ed-985d,srcvid=view_id(REG,3c167a24-b8f0,75),insvid=view_id(UNKNOWN,00000000-0000,0),ru=00000000-0000,r=[-1,-1],fs=5786566,nl=(
)
}
 state after handling message: evs::proto(evs::proto(acbe31e8-9673, OPERATIONAL, view_id(REG,3c167a24-b8f0,75)), OPERATIONAL) {
current_view=view(view_id(REG,3c167a24-b8f0,75) memb {
  3c167a24-b8f0,0
  4ebe7f59-8874,0
  774206ed-985d,0
  91f27cc6-a5ee,0
  acbe31e8-9673,0
  be92f86c-94d1,0
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=4,safe_seq=3,node_index=node: {idx=0,range=[5,4],safe_seq=4} node: {idx=1,range=[5,4],safe_seq=3} node: {idx=2,range=[5,4],safe_seq=3} node: {idx=3,range=[5,4],safe_seq=4} node: {idx=4,range=[5,4],safe_seq=4} node: {idx=5,range=[5,4],safe_seq=4} },
fifo_seq=3248089,
last_sent=4,
known:
3c167a24-b8f0 at tcp://10.*.*.*:4567
{o=1,s=0,i=1,fs=3061648,}
4ebe7f59-8874 at tcp://10.*.*.*:4567
{o=1,s=0,i=1,fs=3065890,}
774206ed-985d at tcp://10.*.*.*:35200
{o=1,s=0,i=1,fs=5786566,}
91f27cc6-a5ee at tcp://10.*.*.*:35200
{o=1,s=0,i=1,fs=5780462,}
acbe31e8-9673 at
{o=1,s=0,i=1,fs=-1,}
be92f86c-94d1 at tcp://10.*.*.*:4567
{o=1,s=0,i=1,fs=5740130,}
 }2024-01-03  0:10:32 0 [ERROR] WSREP: exception from gcomm, backend must be restarted: acbe31e8-9673 aborting due to conflicting prims: older overrides (FATAL)
   at ./gcomm/src/pc_proto.cpp:handle_state():1052
2024-01-03  0:10:32 0 [Note] WSREP: gcomm: terminating thread
2024-01-03  0:10:32 0 [Note] WSREP: gcomm: joining thread
2024-01-03  0:10:32 0 [Note] WSREP: gcomm: closing backend
2024-01-03  0:10:32 0 [Note] WSREP: Forced PC close
2024-01-03  0:10:32 0 [Warning] WSREP: discarding 28 messages from message index
2024-01-03  0:10:32 0 [Note] WSREP: gcomm: closed
2024-01-03  0:10:32 0 [Note] WSREP: New SELF-LEAVE.
2024-01-03  0:10:32 0 [Note] WSREP: Closing send monitor...
2024-01-03  0:10:32 0 [Note] WSREP: Closed send monitor.
2024-01-03  0:10:32 0 [Note] WSREP: Closing replication queue.
2024-01-03  0:10:32 0 [Note] WSREP: Closing slave action queue.
2024-01-03  0:10:32 1 [Note] WSREP: Applier thread exiting ret: 6 thd: 1
2024-01-03  0:10:32 9 [Note] WSREP: Applier thread exiting ret: 6 thd: 9
2024-01-03  0:10:32 6 [Note] WSREP: Applier thread exiting ret: 6 thd: 6
2024-01-03  0:10:32 0 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 35659)
2024-01-03  0:10:32 0 [Note] WSREP: RECV thread exiting -103: Software caused connection abort
2024-01-03  0:10:32 8 [Note] WSREP: recv_thread() already closing, joining thread.
2024-01-03  0:10:32 8 [Note] WSREP: recv_thread() joined.
2024-01-03  0:10:32 8 [Note] WSREP: ================================================
View:
  id: 00000000-0000-0000-0000-000000000000:-1
  status: non-primary
  protocol_version: -1
  capabilities:
  final: yes
  own_index: -1
  members(0):
=================================================
2024-01-03  0:10:32 8 [Note] WSREP: Non-primary view
2024-01-03  0:10:32 8 [Note] WSREP: Server status change synced -> disconnected
2024-01-03  0:10:32 8 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2024-01-03  0:10:32 8 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2024-01-03  0:10:32 8 [Note] WSREP: Applier thread exiting ret: 6 thd: 8
}

11.1.2-MariaDB version, there is no operation on 3 nodes in a cluster, but this problem is caused. All nodes report this error at the same time and cannot read and write data `WSREP has not yet prepared node for application use`.

Maybe it's a bug? but I'm not sure what triggered it



 Comments   
Comment by Runzi [ 2024-01-18 ]

Any suggestions?

Comment by Jan Lindström [ 2024-02-08 ]

Runzi Please provide full error log from all nodes, node configuration and used Galera library version.

Generated at Thu Feb 08 10:37:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.