[MDEV-14971] Mariadb stops working when second galera nodes joins - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.2.12, 10.3.5
Fix Version/s: N/A
Component/s: Galera SST
Labels:
None
Environment:
FreeBSD 11.1
CentOS 7.4

Description

I have just upgrade mariadb to 10.2.12 and enabled galera.
When the second node joins, the first master node stops working and need to be restarted to recover.

Here's my error log:

2018-01-17 12:01:33 34426956544 [Note] WSREP: (3153ad69, 'tcp://0.0.0.0:4567') connection established to f61103ed tcp://192.168.62.211:4567

2018-01-17 12:01:33 34426956544 [Note] WSREP: (3153ad69, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:

2018-01-17 12:01:33 34426956544 [Note] WSREP: (3153ad69, 'tcp://0.0.0.0:4567') connection established to ba2863c0 tcp://192.168.62.201:4567

2018-01-17 12:01:33 34426956544 [Note] WSREP: declaring ba2863c0 at tcp://192.168.62.201:4567 stable

2018-01-17 12:01:33 34426956544 [Note] WSREP: declaring f61103ed at tcp://192.168.62.211:4567 stable

2018-01-17 12:01:33 34426956544 [Warning] WSREP: 3153ad69 conflicting prims: my prim: view_id(PRIM,3153ad69,1) other prim: view_id(PRIM,ba2863c0,16)

2018-01-17 12:01:33 34426956544 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:

pc::Proto{uuid=3153ad69,start_prim=1,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=53547,checksum=0,instances=

        3153ad69,prim=1,un=0,last_seq=53547,last_prim=view_id(PRIM,3153ad69,1),to_seq=53546,weight=1,segment=0

,state_msgs=

        3153ad69,pcmsg{ type=STATE, seq=0, flags= 0, node_map { 3153ad69,prim=1,un=0,last_seq=53547,last_prim=view_id(PRIM,3153ad69,1),to_seq=53546,weight=1,segment=0

}}

,current_view=view(view_id(REG,3153ad69,17) memb {

        3153ad69,0

        ba2863c0,0

        f61103ed,0

} joined {

        ba2863c0,0

        f61103ed,0

} left {

} partitioned {

}),pc_view=view(view_id(PRIM,3153ad69,1) memb {

        3153ad69,0

} joined {

} left {

} partitioned {

}),mtu=32636}

2018-01-17 12:01:33 34426956544 [Note] WSREP: {v=0,t=1,ut=255,o=4,s=0,sr=0,as=-1,f=4,src=ba2863c0,srcvid=view_id(REG,3153ad69,17),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=36492262,nl=(

} 64

2018-01-17 12:01:33 34426956544 [ERROR] WSREP: exception caused by message: {v=0,t=3,ut=255,o=1,s=0,sr=-1,as=0,f=4,src=f61103ed,srcvid=view_id(REG,3153ad69,17),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=8,nl=(

 state after handling message: evs::proto(evs::proto(3153ad69, OPERATIONAL, view_id(REG,3153ad69,17)), OPERATIONAL) {

current_view=view(view_id(REG,3153ad69,17) memb {

        3153ad69,0

        ba2863c0,0

        f61103ed,0

} joined {

} left {

} partitioned {

}),

input_map=evs::input_map: {aru_seq=0,safe_seq=0,node_index=node: {idx=0,range=[1,0],safe_seq=0} node: {idx=1,range=[1,0],safe_seq=0} node: {idx=2,range=[1,0],safe_seq=0} },

fifo_seq=56867,

last_sent=0,

known:

3153ad69 at

{o=1,s=0,i=1,fs=-1,}

ba2863c0 at tcp://192.168.62.201:4567

{o=1,s=0,i=1,fs=36492264,}

f61103ed at tcp://192.168.62.211:4567

{o=1,s=0,i=1,fs=8,}

 }2018-01-17 12:01:33 34426956544 [ERROR] WSREP: exception from gcomm, backend must be restarted: 3153ad69 aborting due to conflicting prims: older overrides (FATAL)

         at gcomm/src/pc_proto.cpp:handle_state():982

2018-01-17 12:01:33 34426956544 [Note] WSREP: gcomm: terminating thread

2018-01-17 12:01:33 34426956544 [Note] WSREP: gcomm: joining thread

2018-01-17 12:01:33 34426956544 [Note] WSREP: gcomm: closing backend

2018-01-17 12:01:33 34426956544 [Note] WSREP: Forced PC close

2018-01-17 12:01:33 34426956544 [Warning] WSREP: discarding 2 messages from message index

2018-01-17 12:01:33 34426956544 [Note] WSREP: gcomm: closed

2018-01-17 12:01:33 35628642304 [Note] WSREP: Received self-leave message.

2018-01-17 12:01:33 35628642304 [Note] WSREP: comp msg error in core 53

2018-01-17 12:01:33 38097805824 [Warning] WSREP: Send action {0x0, 2338, TORDERED} returned -53 (Software caused connection abort)

2018-01-17 12:01:33 38099474176 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:33 35628642304 [Note] WSREP: Closing send monitor...

2018-01-17 12:01:33 35628642304 [Note] WSREP: Closed send monitor.

2018-01-17 12:01:33 35628642304 [Note] WSREP: Closing replication queue.

2018-01-17 12:01:33 35628642304 [Note] WSREP: Closing slave action queue.

2018-01-17 12:01:33 38099134208 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:33 38099472896 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:33 38099139328 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:33 38099467776 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:34 38099475456 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:34 38083915264 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:34 38099460096 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:34 38099654912 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:34 35628644864 [Note] WSREP: applier thread exiting (code:6)

2018-01-17 12:01:34 38095302656 [Note] WSREP: applier thread exiting (code:6)

Attachments

Issue Links

relates to

MDEV-15399 Galera catches exception and terminates, but MariaDB keeps going

Closed

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: TAO ZHOU

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-01-17 01:57

Updated:: 2022-01-24 01:02

Resolved:: 2022-01-24 01:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.