Details
-
Sub-Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Do
-
1.4.0
-
None
Description
Found some optimization possibilities in the writeengine code. It was coded with only local disk and HDFS in mind, and what it does with each is very different. It would be worthwhile to go over this code and see if it would make sense to use some of these hdfs paths for cloud storage as well.
For example, chunk shifting involves multiple renames. In local storage that's a simple atomic operation (where atomic means there's no failure state where the file is left half-renamed), and WE uses it liberally with the assumption that it's fast and atomic. In cloud storage, a rename is a copy and a delete, so neither assumption is valid. May be worthwhile to write a distinct cloud-storage path or use the existing hdfs path for cases like that.