[MCOL-5302] mcs-savebrm.py overwrites Extent Map files multiple times with shared(non-S3) storage setup Created: 2022-11-11 Updated: 2023-11-17 Resolved: 2023-01-18 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | installation |
| Affects Version/s: | 22.08.3 |
| Fix Version/s: | 22.08.8 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Roman | Assignee: | Alan Mologorsky |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | 2022-22, 2022-23 | ||||||||
| Assigned for Testing: | |
||||||||
| Description |
|
When MCS cluster is shutdown mcs-savebrm.py is called to save extent map. The original intention was to save it only on primary node but mcs-savebrm.py doesn't detect a primary in a shared(non-S3) storage setup. This effectively allows cluster nodes to overwrite extent map files multiple times during shutdown and in some cases causes save_brm binary to stuck.
|
| Comments |
| Comment by Roman [ 2022-11-28 ] |
|
4QA
Plz test a failure scenario when the primary is lost before the shutdown and: a) failover finishes, b) failover doesn't have time to converge but the cluster is shutdown |
| Comment by Daniel Lee (Inactive) [ 2023-01-18 ] |
|
Build verified: 23.02 (Drone build# 6492) Verified on a 3 node cluster with NFS shared storage Checked timestamps for all files in the dbrm directory every .5 seconds during the test scenarios. Timestamp for dbrm files changed only one time. For scenario #4, stopping cluster while primary node is down caused this messages, which is expected since s1pm2 was the primary node. [rocky8:root@rocky8~]# mcs cluster stop , } |