[MCOL-4670] Primary Node Failover in a cluster with S3 is left in an unusable state at times - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 5.4.3
Fix Version/s: 5.6.1
Component/s: Storage Manager
Labels:
None

Description

After a primary node failover, non-primary nodes are occasionally failing to loadbrm.
The mcs-loadbrm.service outputs the following:

Mar 22 21:22:04 pm3 mcs-loadbrm.py[29596]: Loading BRM snapshot failed (/tmp/columnstore_tmp_files/rdwrscratch/BRM_saves)
Mar 22 21:22:04 pm3 mcs-loadbrm.py[29596]: ExtentMap::load(): That file is not a valid ExtentMap image

There are no other indications that there was an error, until queries are attempted on the system.

The workaround is to restart the cluster via cmapi cluster/stop, cluster/start

Attachments

Issue Links

relates to

MCOL-4440 Multi-Node CS 5.4 with S3 storage failover gets stuck, requires API call to restart cluster

Closed

Activity

People

Assignee:: Roman

Reporter:: Jose Rojas (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2021-04-06 22:19

Updated:: 2021-05-11 14:50

Resolved:: 2021-05-11 14:50

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.