question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dvc import/get not working with a git repository with two remotes

See original GitHub issue

Bug Report

Description

I have a git repository with two remotes, my .dvc/config is as follows, basically one remote for model, and one for data

[core]
    autostage = true
    remote = modelstore
['remote "modelstore"']
    url = s3://models/classifier/dvcstore
    endpointurl = <endpoint URL>
['remote "datastore"']
    url = s3://data/classifier/dvcstore
    endpointurl = <endpoint URL>

Here, model.dvc & data.dvc exist under /classifier dir

Then I have other repository to consume only the model from my artifact registry repo above. When I try to import the model I get the following error

dvc import <git repo> classifier/model --rev <tag>
ERROR: failed to import '<model path>' from '<git repo>'. - The path 'classifier/model' does not exist in the target repository '<git repo>' neither as a DVC output nor as a Git-tracked file.:

However the mode.dvc file exists in the repo under classifier/model. I have tried with and without a default remote, same error. Using dvc get, I only get ERROR: unexpected error

Reproduce

Expected

model artifacts are downloaded on local model

Environment information

Output of dvc doctor on consumer repo:

$ dvc doctor
DVC version: 2.27.2 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-4.18.0-372.26.1.el8_6.x86_64-x86_64-with-glibc2.28
Subprojects:
        dvc_data = 0.10.0
        dvc_objects = 0.4.0
        dvc_render = 0.0.11
        dvc_task = 0.1.2
        dvclive = 0.11.0
        scmrepo = 0.1.1
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2022.10.0, boto3 = 1.24.59)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/mapper/vg_li1744707733-lv_root
Caches: local
Remotes: None
Workspace directory: xfs on /dev/mapper/vg_li1744707733-lv_root
Repo: dvc, git

Output of dvc doctor on artifact registry repo:

$ dvc doctor
DVC version: 2.27.2 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-4.18.0-372.26.1.el8_6.x86_64-x86_64-with-glibc2.28
Subprojects:
	dvc_data = 0.10.0
	dvc_objects = 0.4.0
	dvc_render = 0.0.11
	dvc_task = 0.1.2
	dvclive = 0.11.0
	scmrepo = 0.1.1
Supports:
	http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2022.10.0, boto3 = 1.24.59)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/mapper/vg_li1744707733-lv_root
Caches: local
Remotes: s3, s3
Workspace directory: xfs on /dev/mapper/vg_li1744707733-lv_root
Repo: dvc, git

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:13

github_iconTop GitHub Comments

1reaction
moisesrc13commented, Nov 12, 2022

@pmrowla thanks so much!, adding the default remote in mode.dvc & data.dvc made it to work 🎉 if this is not yet in your documentation would be really useful to have

1reaction
pmrowlacommented, Nov 12, 2022

@moisesrc13 is it possible someone forgot to dvc push the latest data from your data registry?

Also, one thing you may want to consider is using the remote output option in your .dvc files, which enforces that the given file or dir should only be pushed to a specific remote.

So in classifier/model.dvc you would have:

outs:
  - md5: ...
    path: model
    ...
    remote: modelstore

and in classifier/data.dvc you would use remote: datastore

This way you can just run dvc push in your data registry (without manually specifying classifier/model or classifier/data and the relevant remote). This will also make dvc import use the correct remotes, without you needing to change the default remote in the data registry.

see: https://dvc.org/doc/user-guide/project-structure/dvc-files#output-entries

Read more comments on GitHub >

github_iconTop Results From Across the Web

Import/list + webdavs remote not working - Questions - DVC
I have set up two repositories. The first one, a data registry, currently with a single dataset. The remote git repository is hosted...
Read more >
`dvc list`: handle local repos differently? · Issue #3590 - GitHub
Hi @jamessergeant, dvc list expects the path or URL to the DVC repository itself, not to a remote storage location. In fact I...
Read more >
dvc - How to use different remotes for different folders?
You can first add the different DVC remotes you want to establish (let's say ... Yes, you can use multiple remotes without Git-submodules....
Read more >
Working with Git remotes and pushing to multiple Git repositories
When you do git init , you initialize a local Git repository. In general, the purpose is to synchronize this repo with a...
Read more >
2.5 Git Basics - Working with Remotes
The word “remote” does not necessarily imply that the repository is somewhere ... For example, a repository with multiple remotes for working with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found