[MDEV-26869] Mariadb going to non-primary after one node leaves the cluster while doing host shutdown. - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.2.32, 10.4(EOL)
Fix Version/s: 10.3(EOL)
Component/s: Galera
Labels:
None

Description

We are facing the issue in mariadb galera cluster deployed in k8. We are having a problem where if there is power issue in one of the node hosting one of the galera other two galera nodes goes non primary. What we have seen and proved that network was stable between the nodes on 4567 ports that were left in cluster. After enabling the debug logs I was able to see that only difference between when it doesn't happen and when it happens is install message was never being exchanged between the nodes.
This is reproduce-able in both 10.2 and 10.4 with both galleria3 and galera4.
Good scenarios where cluster didn't die has this message exchanged

021-10-15T19:29:22.390596315Z stderr F 2021-10-15 19:29:22,390 - OpenStack-Helm Mariadb - INFO - b'2021-10-15 19:29:22 140341999027968 [Note] [Debug] WSREP: gcomm/src/pc_proto.cpp:handle_install():1103: cd32f6ad handle install from a4cb7bc7 pcmsg{ type=INSTALL, seq=0, flags= 0, node_map {\ta4cb7bc7,prim=1,un=0,last_seq=58,last_prim=view_id(PRIM,a4cb7bc7,19),to_seq=245997,weight=1,segment=0'

For Bad node scenario

2021-10-15T19:29:54.4610095Z stderr F 2021-10-15 19:29:54,460 - OpenStack-Helm Mariadb - INFO - b'2021-10-15 19:29:54 140712551864064 [Warning] WSREP: gcomm/src/evs_proto.cpp:handle_install_timer():690: evs::proto(be28c9b9, GATHER, view_id(REG,17423d3f,11)) install timer expired'

if two nodes are survivor, the cluster should survive. The issue is delaying the production readiness testing and defeating the purpose of the clustering in first place.

I have attached the debug logs and pcap supporting the argument this was not the network issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb-server-0_ucp_mariadb-8585e9731ab17bd8f1a781791b2e68dc8c745ad46e1004c62d5a4f1e8aaf4aeb.log
180 kB
2021-10-20 16:20
mariadb-server-1_ucp_mariadb-40e922811cd78c0b043e285e0b19c7433b547c039c53ec2afe2a3ff14f973a3f.log
187 kB
2021-10-20 16:20
mariadb-server-2_ucp_mariadb-e300cc37746fd3f75a69d7b07d01b3703552208b3e9b78d8b80d25360a858015.log
7 kB
2021-10-20 16:20
mariadb-server-1_ucp.pcap
8.24 MB
2021-10-20 16:21
mariadb-server-0_ucp.pcap
7.46 MB
2021-10-20 16:21
mariadb-server-2-config.rtf
5 kB
2021-10-20 17:04
mariadb-server-config.rtf
6 kB
2021-10-20 17:04
mariadb-server1-config.rtf
5 kB
2021-10-20 17:04

Activity

People

Assignee:: Seppo Jaakola

Reporter:: Jasvinder singh kwatra

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2021-10-20 16:19

Updated:: 2023-12-11 08:50