Details
-
Sub-Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
None
-
None
Description
Data is currently moved around as-is. Might be a good idea to verify the integrity of uploads and downloads. iirc, S3 returns the md5 of an object on a HEAD op. It would likely slow things down a little, so maybe make this an option rather than a requirement.
Update: we ran into a problem with data integrity after an SM crash; see MCOL-3711. Two tasks come from this. 1) SM needs to be robust against crashes during a write, and 2) it has to do the right thing when reading corrupted data.
For 1, we can write to tmp files, then move it to the right location once done so that another SM instance would only see completed writes. For 2, we can add a checksum to the metadata entries and journal entries, and verify on read. These are only initial thoughts, there may be better options.