[MCOL-5405] mcs-savebrm stores an empty EM on cluster shutdown rendering the cluster unusable Created: 2023-01-27  Updated: 2023-10-26

Status: Open
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 6.4.6, 22.08.7
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Roman Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

There are proven customer cases that effectively erases extent map when mcs-savebrm saves an empty extent map on S3. This can happen in case if S3 layer fails server input-output error. Here is the log snippet that demonstrates the real case errors:

Jan 19 18:31:55 nvmesh-target-c IDBFile[2332086]: 55.504161 |0|0|0| D 35 CAL0002: IDBFactory::installPlugin: installed filesystem plugin libcloudio.so
Jan 19 18:31:55 nvmesh-target-c StorageManager[2275439]: OpenTask: caught 'boost::filesystem::status: Transport endpoint is not connected: "/var/lib/columnstore/storagemanager/metadata/data1/systemFiles/dbrm/BRM_saves_em.meta"'
Jan 19 18:31:55 nvmesh-target-c controllernode[2332086]: 55.507125 |0|0|0| C 29 CAL0000: ExtentMap::save(): open:  Input/output error
Jan 19 18:31:55 nvmesh-target-c StorageManager[2275439]: Runner::watchForInterlopers(): failed to stat /var/lib/columnstore/storagemanager/metadata/data1/REQUEST_TRANSFER, got Transport endpoint is not connected

The suggested solution is to check Extent Map file size in memory before storing it on S3. If it is 0 mcs-savebrm must skip save operation.
There must be no save_brm called in case if in-memory EM image is empty.
This approach doesn't save from partially written EM images though.


Generated at Thu Feb 08 02:57:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.