[MCOL-4546] Extent Map gets occasionally currupted when a multi-node cluster with shared storage for HA is recycled. - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 5.5.1
Fix Version/s: 5.5.2
Component/s: None
Labels:
- crash

Sprint:
2021-3, 2021-4

Description

There exists a window of opportunity during cluster recycle to corrupt the Extent map. It first gets corrupted in memory on startup, and then permanently on disk on next save_brm (explicit by API or implicit by Shutdown).

The root cause is an error in mcs-loadbrm.py, by which all nodes point to the same BRM directory and hence BRM_saves_em file. During cluster start, the directory is copied to each node - and then it will be read by the Primary when the processes start.

It is therefore possible that the reading may coincide with a secondary node still writing. When HA is deployed (via GlusterFS or any other NFS), the reading primary is liable to loose fragments of Extent map.

In order to happen, a fairly large Extent Map is needed (lots of columns, lots of rows, or both). Customer's EM was 4.5MB, and in house reproduction was on 8MB.

When it occurs, it is almost always accompanied by log messages like

CAL0000: ExtentMap::load(): That file is not a valid ExtentMap image
CAL0000: ExtentMap::loadVersion4(): read : No such file or directory

REPRODUCTION

1. Configure 3 nodes cluster (can do more). AWS reproduced faster than dockers, but can be dockers.
2. Cluster needs to have shared disk and be configured for failover. If AWS or bare metal - make it GlusterFS. If dockers - attached volumes.
3. Create a database with large extent map.
a) an easy way to do it - create 100 tables with 1000 columns each. You do not need to populate them, emty is just fine.
b) make sure the BRM_saves_em in /var/lib/columnstore/data1/systemFiles/dbrm is at lease 8mb in size.
4. Start doing shutdowns, followed by startups. Do not do any CRUDs when in the product (selects OK). I used the script (attached), but I did see on manual actions as well. It is random, you need to do do it a few times, it depends on exact timings of things.

After every shutdown, check the size of BRM_saves_em. If it got smaller - you have a corrupted Extent Map and a blown database.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

patch.diff
1 kB
2021-02-22 16:57
repro.sh
0.7 kB
2021-02-22 18:28

Activity

People

Assignee:: Daniel Lee (Inactive)

Reporter:: Gregory Dorman (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2021-02-19 13:59

Updated:: 2021-02-24 18:29

Resolved:: 2021-02-24 17:51

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.