new command: put-url OR rsync/rclone
See original GitHub issueSummary
An upload equivalent of dvc get-url.
We currently use get-url as a cross-platform replacement for wget. However, together with get-url, put-url will turn DVC into a replacement for rsync/rclone.
Motivation
- we already have
get-urlso addingput-urlseems natural for the same reasons put-urlwill be used by- CML internally to sync data
- LDB internally to sync data
- the rest of the world
- uses existing functionality of DVC so should be fairly quick to expose
- cross-platform multi-cloud replacement for
rsync/rclone. What’s not to love?- could even create a spin-off thin wrapper (or even abstract the functionality) in a separate Python package
Detailed Design
usage: dvc put-url [-h] [-q | -v] [-j <number>] url targets [targets ...]
Upload or copy files to URL.
Documentation: <https://man.dvc.org/put-url>
positional arguments:
url Destination path to put data to.
See `dvc import-url -h` for full list of supported
URLs.
targets Files/directories to upload.
optional arguments:
-h, --help show this help message and exit
-q, --quiet Be quiet.
-v, --verbose Be verbose.
-j <number>, --jobs <number>
Number of jobs to run simultaneously. The default
value is 4 * cpu_count(). For SSH remotes, the default
is 4.
How We Teach This
- Name:
put-urlseems to be in line with the existingget-url(vis. HTTPGET&PUT) - Idea presentation: continuation of existing DVC patterns
- Docs: simply add https://dvc.org/doc/command-reference/put-url largely based off the existing https://dvc.org/doc/command-reference/get-url
- Teaching: not required
Drawbacks
- can’t think of any
Alternatives
- would have to re-implement per-cloud sync options for CML & other products
Unresolved Questions
- minor implementation details
- CLI naming (
put-url)? - CLI argument order (
url targets [targets...])? - Python API (
dvc.api.put_url())?
- CLI naming (
Please do assign me if happy with the proposal.
(dvc get-url + put-url = dvc rsync 😃)
Issue Analytics
- State:
- Created 2 years ago
- Reactions:5
- Comments:26 (26 by maintainers)
Top Results From Across the Web
Tcl Built-In Commands - Http manual page
The ::http::geturl command returns a token value that can be used to get information about the transaction. See the STATE ARRAY and ERRORS...
Read more >http - the Tcler's Wiki!
Commands. http::geturl: Performs an HTTP transaction. ... Darren New observes that gethostbyname() can't be trusted to be thread-safe .
Read more >Generating a presigned URL to upload an object
Upload Amazon S3 objects using presigned URLs when someone has given you permissions to access the object identified in the URL.
Read more >Rclone
Rclone is a command-line program to manage files on cloud storage. ... Copy new or changed files to cloud storage; Sync (one way)...
Read more >Trying to register commands: DiscordAPIError[50001]: Missing ...
Have you made sure that the 'applications.commands' scope is checked in the scopes section of the OAuth2 settings for your bot in the ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I’m trying to aggregate our discussions here and in person to action points:
dvc exportthat should upload a local file to a cloud and preserve a link (.dvc file) similar to result ofdvc import-url.dvc put-url. It is not a part of use cases (see below) but something like this needs to work under the hood ofdvc exportanyway. And it might be handy for other scenarios.dvc import-url ---etags-only(--no-execbut it gets etags from cloud) and/ordvc update --etags-only. This is needed to track file statuses when file is not downloaded locally.Important:
Below are user use cases that should help to understand the scenarios.
From local to Cloud/S3
A model
out/model.h5is saved in a local directory: local machine or cloud/TPI or CML, it might be DVC/Git or just a directory like~/. The model needs to be uploaded to a specified place/url in a cloud/S3. User needs to keep the pointer file (.dvc) for future use.Why user needs the pointer file:
dvc getto download the fileUploading
Note, This command is an equivalent to
aws s3 cp file s3://path && dvc import-url s3://path file. We can consider introducing a separate command to cover the copy part in cross-cloud way -dvc put-url. However, the priority is not high in the context of the scenario.Updating
A model file was changed (as a result of re-training) for example:
From cloud to workspace
Users write models/data to cloud from user’s code (or it is already updated by an external tool). Saving pointer to a model file still might be useful. Why:
dvc getto download the fileTracking a cloud file
After training is done and a file saved to s3://mybucket/ml/prod/2022-03-07-model.h5:
Tracking a cloud file without a local copy
In some cases, user does writes a file in a storage and does not need a copy in workspace.
dvc import-url --no-execseems like a good option to cover this case.Technically, the file will still have a virtual representation in the workspace as
my-model.h5. However, it won’t be materialized untildvc update my-model.h5.dvcis called.Pros/Cons:
import-urlwas called).To cover the latest cons, we can consider introducing
dvc import-url ---etags-only(--no-execbut get etags from cloud) and/ordvc update --etags-only.From local to Cloud/S3
In this scenario, the user has their own local
model.h5file already. It may or may not be tracked by DVC. If it is tracked by DVC, it might be tracked inmodel.h5.dvcor withindvc.lock(if it’s generated by a DVC stage).If they want to upload to the cloud and keep a pointer locally,
dvc exportcan be equivalent todvc run --external -n upload_data -d model.h5 -o s3://testproject/model.h5 aws s3 cp model.h5 s3://testproject/model.h5. This is the inverse ofimport-url, as shown in the example in https://dvc.org/doc/command-reference/import-url#description.As @shcheklein noted, the workflow here assumes the user saves updates locally, so it makes sense for
updateto go in the upload direction and enforce a canonical workflow of save locally -> upload new version.Similar to how
import-urlrecords the external path as a dependency and the local path as an output,exportcan record the local path as a dependency and the local path as an output. Since amodel.h5.dvcfile may already exist from a previousdvc add(withmodel.h5as an output), it might make more sense to save the export info with some other file extension, likemodel.h5.export.dvc(this avoids conflicts between the dependencies and outputs of each).I’ll follow up on the other scenarios in another comment to keep this from being too convoluted 😅
Edit: On second thought, maybe it’s better to resolve this scenario first 😄 . The others might require a separate discussion.