question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ability to track/log artifacts in cloud storage

See original GitHub issue

🚀 Feature

Aim storage is built around rocksdb & sqlite. It utilizes the running servers’ disk space, which hinders the ability to store large volumes of data/metadata. Examples of such data are: model checkpoints, dataset versions, etc. In addition to that there are cases when single items are not huge but the overall volume of it can not fit on a hard drive. Examples of such case are tracked media files for very long sequences.

Motivation

Extend Aim functionality to store any kind of data, including large files, directories and other artifacts. Offload Aim storage by moving BLOB data to a cloud storage and keeping only meta-data in .aim.

Pitch

Change the SDK/CLI interface to be able to pass cloud storage URIs as an argument to creating Repo. Introduce new custom object types to handle artifacts and BLOBs. Enhance URIService to handle both local (rocksdb) and cloud-based URIs.

Alternatives

As an alternative we can consider mounting cloud storage as a FUSE file system and use it just as a local storage. Examples of such implementation are s3fs. However, the way Aim is implemented requires multiple small I/O operations, hence file system latency is crucial. Initial benchmarks showed poor performance of Aim when running with FUSE fs.

Additional context

Related issues: https://github.com/aimhubio/aim/issues/1569 https://github.com/aimhubio/aim/issues/1428

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

5reactions
alberttorosyancommented, Mar 28, 2022

@ashutoshsaboo the initial version will support S3, GCS and local (the latter is especially useful for testing). the request for converting existing repos is a great addition! thanks for pointing this. it boils down to a simple script wrapped as a CLI command. will make sure to include it.

0reactions
ashutoshsaboocommented, Mar 28, 2022

Thanks for the details! True agree, especially agree on the security part since currently Aim doesn’t make any internet calls and everything is local and calls to the cloud would be a step up from a security standpoint. In the first version please add native support for S3 if possible, I’d see how it turns out to be and happily use this feature! 😃

Request: If there can be an easier native way vended out, to port currently local repos to S3 with the cloud native support, that’d be really awesome and allow a more easy way to leverage this feature completely! @alberttorosyan

Read more comments on GitHub >

github_iconTop Results From Across the Web

Storing build artifacts in Cloud Storage - Google Cloud
This page explains how you can store build artifacts in Cloud Storage. We recommend using Artifact Registry for storing build artifacts.
Read more >
Building Artifacts on the Cloud - Medium
Building non-container artifacts and storing on Cloud Storage. Automating builds from repositories using Cloud Build Triggers.
Read more >
Azure Machine Learning - ML as a Service | Microsoft Azure
Improve productivity with the studio capability, a development experience that ... Share and discover machine learning artifacts across multiple teams for ...
Read more >
Tracking Artifacts by Reference – Weights & Biases - Wandb
This is a walkthrough of Dataset and Predictions visualization for a tiny dataset of audio files stored on the Google Cloud Platform (GCP)....
Read more >
Tracked Log Skidder
Track Log Loaders Logging Equipment For Sale. ... The Cat Rental Store offers wheeled and tracked skidders to meet your application's ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found