question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dvc get: fails with gdrive remote

See original GitHub issue

Bug Report

dvc get . $relative_path: fails with private organisational github repo and gdrive

Description

Thanks for the amazing package - loving it.

I’m trying to download an individual file from a dvc google drive remote based on a private organisational github repository. Although dvc list works, dvc get . $path and dvc import . $path is unsuccessful. I’m attaching error logs.

Reproduce

In the root of the directory corresponding to the private organizational github repository:

# works
dvc list . $relative_path

# doesn't work -> error.log.1
dvc get -v . $relative_path 2>& 1 | tee error.log.1.txt

# doesn't work -> error.log.2
dvc get -v $github_url $relative_path 2>&1 | tee error.log.2.txt  

Error logs: error.log.1.txt error.log.2.txt

Downgrading to dvc==2.10.2 solves the issue.

Expected

I expect a particular data file to be downloaded from google drive.

Output of dvc doctor:

DVC version: 2.11.0 (pip)
---------------------------------
Platform: Python 3.8.13 on Linux-5.13.0-52-generic-x86_64-with-glibc2.17
Supports:
	gdrive (pydrive2 = 1.10.1),
	webhdfs (fsspec = 2022.5.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.8),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.8)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: gdrive
Workspace directory: ext4 on /dev/sda2
Repo: dvc, git

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
efiopcommented, Jul 19, 2022

@ursueugen Next dvc release should have this fixed (feel free to try installing from upstream to try it out for now). Thank you for reporting the issue! 🙏

1reaction
shchekleincommented, Jul 8, 2022

Two things here, and looks like can be more or less quickly fixed. Happy to help someone with this since I’m not sure I’ll have enough focus to get there soon 😦.

  1. Bug. Should be really quick fix and can unblock users. Assert is not needed (or at least no always needed) since in some cases below we don’t use tmp_dir at all. Assert prevents a possible workaround(s) (that you want to use anyway if you use dvc get with gdrive). Namely, overriding the gdrive_user_credentials_file globally.

  2. Improvement to the way we write (cache) the credentials files in different scenarios. A bit more involved but it seems to be reasonable in scope.

  • We read from an ENV var and we use tmp_fname(repo.tmp_dir) - in all those place in that function we can and should use a global tmp - we don’t persist these files between sessions. We need to be careful and see that we have that location at all. This can be potentially improved in the PyDrive itself. We can let clients pass these values as is, right from the env variables. Or let pydrive read those env vars (this can be a bit involved)
  • https://github.com/iterative/dvc-objects/blob/main/src/dvc_objects/fs/implementations/gdrive.py#L80 - in this case we cache credentials and we need this file next time. We can reconsider the default and use appdir instead of the repo.tmp_dir here. There will be some implications - project will see now each other and we need to think about backward compatibility, but overall it should be fine. Question - does appdir always exist, should users be able to write to repo/.dvc is the need it?
Read more comments on GitHub >

github_iconTop Results From Across the Web

ERROR: configuration error - GDrive remote auth failed with ...
dvc pull runs before I make any changes to the container i.e. before I install dependencies. So, it should be the default version....
Read more >
configuration error - Failed to authenticate GDrive remote ...
I ran this github actions workflow with several variations, but I cannot pull the data from DVC. name: auto-testing on: [push] jobs: run: ......
Read more >
Developers • gpgr - umccr
For DVC, make sure you install dvc and the corresponding remote package e.g. dvc-gdrive , dvc-s3 etc., or else you'll get something ...
Read more >
Troubleshooting - DagsHub Docs
Users might get an error when trying to push files to DagsHub while pulling files ... Root cause 2: You didn't configure your...
Read more >
Version Control your Large Datasets using Google Drive
dvc remote add -d storage gdrive://auth_code_folder_location ... You may get an error as above if you don't have the package pydrive2 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found