[MCOL-4670] Primary Node Failover in a cluster with S3 is left in an unusable state at times Created: 2021-04-06  Updated: 2021-05-11  Resolved: 2021-05-11

Status: Closed
Project: MariaDB ColumnStore
Component/s: Storage Manager
Affects Version/s: 5.4.3
Fix Version/s: 5.6.1

Type: Bug Priority: Major
Reporter: Jose Rojas (Inactive) Assignee: Roman
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-4440 Multi-Node CS 5.4 with S3 storage fai... Closed

 Description   

After a primary node failover, non-primary nodes are occasionally failing to loadbrm.
The mcs-loadbrm.service outputs the following:

Mar 22 21:22:04 pm3 mcs-loadbrm.py[29596]: Loading BRM snapshot failed (/tmp/columnstore_tmp_files/rdwrscratch/BRM_saves)
Mar 22 21:22:04 pm3 mcs-loadbrm.py[29596]: ExtentMap::load(): That file is not a valid ExtentMap image

There are no other indications that there was an error, until queries are attempted on the system.

The workaround is to restart the cluster via cmapi cluster/stop, cluster/start



 Comments   
Comment by Gregory Dorman (Inactive) [ 2021-05-11 ]

Duplicate of MCOL-4440.

Generated at Thu Feb 08 02:52:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.