question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

support subfolders in the remote storage as the root storage for different projects

See original GitHub issue

I would like to maintain a central git repo that is for all the data registry information for multiple projects. I host my data on google cloud storage: gs://bucket/dvc-datastore, and I would like to have subfolders in this place for my projects, say: gs://bucket/dvc-datastore/project1, gs://bucket/dvc-datastore/project2, etc.

to do this, with current dvc data registry, first i have the following data.dvc for each project

├── project1
│   └── data.dvc
└── project2
    └── data.dvc

then i need to add each of this project data folder as the remote storage so in my .dvc/config

['remote "project1-storage"']
    url = gs://bucket/dvc-datastore/project1
['remote "project2-storage"']
    url = gs://bucket/dvc-datastore/project2

and then based on my undertanding, when i need to pull data based on the data.dvc for each project, i need to specify the remote storage first. which is not so convinent, so I wonder if there could be a keyword like remote-subdir in the data.dvc for project 1

remote-subdir: project1
outs:
- md5: d751713988987e9331980363e24189ce.dir
  size: 1234
  nfiles: 123
  path: data

and for .dvc/config i only need the single remote

['remote "project1-storage"']
    url = gs://bucket/dvc-datastore/project1

now dvc knows for project1 it needs to go to gs://bucket/dvc-datastore/project1 to get the data.

Did i miss anything if such feature is already offered by dvc?

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
shelpercommented, Jun 4, 2022

@karajan1001 that makes sense, but i think that still is not as convienent, as you need to have multiple remote storage in .dvc/config. this could grow and hard to track if we delete/modify project. seems to me that it would be easier to have something like remote-subdir or below to bind this config to project directly

remote: storage
    subdir: /path/to/subdir
1reaction
karajan1001commented, Jun 4, 2022

You can have a remote file in your outs scope to set a special default remote place for it. Something like

outs:
- md5: d751713988987e9331980363e24189ce.dir
  size: 1234
  nfiles: 123
  path: data
  remote: project1-storage

It is in our docs but we haven’t provided a CLI command to set this parameter.

Read more comments on GitHub >

github_iconTop Results From Across the Web

remote add | Data Version Control
"Remote" is how we call storage for DVC projectsDVC projects. It's essentially a local backup for data tracked by DVC. Using an absolute...
Read more >
Folders | Cloud Storage
This page discusses folders in Cloud Storage and how they vary across the ... Any subfolders and the objects they contain are also...
Read more >
Folders
A folder is a storage area that helps keep your projects separate. They enable you to maintain fine-grained control over automations and their...
Read more >
Manage all files on a storage device
Other use cases—such as file manager apps, backup and restore apps, and document management apps—may require similar considerations. Request All files access.
Read more >
Setting up a File System for Document Storage
Note: If you do not set up a subdirectory structure, the system stores all images in the root directory. Click OK to close...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found