[MDEV-27308] 3 problems encountered when node failure during Galera fragmented transaction running - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.5.12
Fix Version/s: 10.6
Component/s: Galera, Galera SST
Labels:
None
Environment:
Redhat 7 on VMware

Description

Hi,

In our production env, Galera transaction fragment is used for running batch job. In some incidents (tmp directory full , VM reboot during hardware memory issue) , we encountered below 3 problems.

Problem #1: SST triggered to recover failed node but IST is expected
Problem #2: in some test, failed node encounters crash with signal 11 repeatedly until node 1 commit
Problem #3: local node state of donor node changed to "Donor/Desynced" unexpectedly after failed recovered

Workaround is manual restart node. But Galera should resume automatically on its own when hardware issue and running IST in most cases.

Repeatable testcase (galera-donor-desync.txt) is attached

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

galera-donor-desync.txt
9 kB
2021-12-19 12:55

Activity

People

Assignee:: Alexey

Reporter:: William Wong

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2021-12-19 12:55

Updated:: 2025-06-12 13:28

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.