Ability to track/log artifacts in cloud storage
See original GitHub issue🚀 Feature
Aim storage is built around rocksdb
& sqlite
. It utilizes the running servers’ disk space, which hinders the ability to store large volumes of data/metadata. Examples of such data are: model checkpoints, dataset versions, etc.
In addition to that there are cases when single items are not huge but the overall volume of it can not fit on a hard drive. Examples of such case are tracked media files for very long sequences.
Motivation
Extend Aim functionality to store any kind of data, including large files, directories and other artifacts. Offload Aim storage by moving BLOB data to a cloud storage and keeping only meta-data in .aim
.
Pitch
Change the SDK/CLI interface to be able to pass cloud storage URIs as an argument to creating Repo. Introduce new custom object types to handle artifacts and BLOBs. Enhance URIService to handle both local (rocksdb) and cloud-based URIs.
Alternatives
As an alternative we can consider mounting cloud storage as a FUSE file system and use it just as a local storage. Examples of such implementation are s3fs. However, the way Aim is implemented requires multiple small I/O operations, hence file system latency is crucial. Initial benchmarks showed poor performance of Aim when running with FUSE fs.
Additional context
Related issues: https://github.com/aimhubio/aim/issues/1569 https://github.com/aimhubio/aim/issues/1428
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
@ashutoshsaboo the initial version will support S3, GCS and local (the latter is especially useful for testing). the request for converting existing repos is a great addition! thanks for pointing this. it boils down to a simple script wrapped as a CLI command. will make sure to include it.
Thanks for the details! True agree, especially agree on the security part since currently Aim doesn’t make any internet calls and everything is local and calls to the cloud would be a step up from a security standpoint. In the first version please add native support for S3 if possible, I’d see how it turns out to be and happily use this feature! 😃
Request: If there can be an easier native way vended out, to port currently local repos to S3 with the cloud native support, that’d be really awesome and allow a more easy way to leverage this feature completely! @alberttorosyan