cloud versioning: pull non-DVC cloud updates
See original GitHub issueIn a project with a cloud-versioned remote, there could be either manual or automated updates happening on the remote outside of any DVC process. For example, a daily ETL process uploads new data directly to the cloud remote. Or someone unfamiliar with DVC wants to upload some new data.
A DVC user should be able to not only pull the versions of data tracked in their project, but also pull the latest version of data available on the cloud. For example, dvc update
could be used to get the latest data the same way it does today for imports.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Get Started: Data Versioning - DVC
Get started with data and model versioning in DVC. Learn how to use a standard Git workflow for datasets and ML models, without...
Read more >Data & Model Management with DVC | Analytics Vidhya
In this post we learn about versioning for ML projects & use DVC to version & maintain ML artifacts in a remote Amazon...
Read more >Syncing Data to GCP Storage Buckets - Iterative.ai
Setting up a remote to make data versioning easier with DVC is a common ... has huge datasets, it's common to store them...
Read more >Versioning a shared dataset using DVC and S3 - GitHub Pages
In that case, the experiment results on the cloud server and those on the lab server will not match even if the same...
Read more >Data and Machine Learning Model Versioning with DVC
This can be on Amazon S3, Google Drive, Azure, Google Cloud ... After the dvc pull command, you'll notice that our twitter_1.csv dataset...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
One thing to clarify that may simplify this: For now, it’s fine to keep the current syntax for
dvc update
where it requires a target. We may need to supportdvc update
without a target eventually, but for now being able to update a dataset at a time is enough.If you import data with
import/import-url
, then delete the source and rundvc update
, it will fail. Let’s stay consistent with that for now.After discussion with @dberenbaum, scope for this issue on initial release will be
dvc update
will only be applicable inworktree = true
scenarios (and notversion_aware = true, worktree = false
)update
should pull the latest versions of outs we are aware of (and have .dvc files for locally).update
would get latest modifications to the fileupdate
would get latest modifications to files in the dir, newly added files in the dir, and deletions of files within the dirupdate
will ignore files/dirs in the remote bucket that we are not aware of already (i.e. new files/dirs that are in the bucket but do not appear in any of our local .dvc/dvc.yaml files)