question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

_DVCFileSystem returns extra part on relative path. Impact on dvc.api.get_url on r2.34.0

See original GitHub issue

Bug Report

Description

I’ve updated from release 2.8.3 to 2.34.0 and dvc.api.get_url crashes.

Reproduce

Repo structure:

repo_root_dir
  |_ common
      |_ other_deps
          |_ other.bin
  |_ module_a
      |_ dependencies
          |_ dep_1.bin
          |_ dep_2.bin
      |_ my_app.py
      |_ setup.py
  |_ module_b
  |
  .
  .
  . 
  | _module_n

During the building, located at module_a folder, I resolve some binaries dependencies URLs based on path using

dvc.api.get_url('module_a/dependencies/dep_1.bin')

under release 2.8.3 it worked ok but now with release 2.34.0 i got the following error

>>> FileNotFoundError: [Errno 2] No such file or directory: '/Users/myuser/projects/repo_root_dir/module_a/module_a/dependencies/dep_1.bin'

Notice that the module_a part is repeated in the path arisen from the error. Following the code I was able to find that

$ pwd
/Users/myuser/projects/repo_root_dir/module_a/
$ python
>>> from dvc.fs.dvc import _DVCFileSystem

>>> path = module_a/dependencies/dep_1.bin'

>>> dvc_file_system = _DVCFileSystem()
>>> dvc_file_system.path.relparts(path, '/')

Out: 
( 'module_a', 
  'module_a', 
  'dependencies', 
  'dep_1.bin' )

The same but in the repo root directory

$ pwd
/Users/myuser/projects/repo_root_dir/
$ python
>>> from dvc.fs.dvc import _DVCFileSystem

>>> path = module_a/dependencies/dep_1.bin'

>>> dvc_file_system = _DVCFileSystem()
>>> dvc_file_system.path.relparts(path, '/')

Out: 
( 'module_a', 
  'dependencies', 
  'dep_1.bin' )

Lookin the code

Looking at this line https://github.com/iterative/dvc/blob/a6bfc7db4ed3a86c58b4bfac3b22f5ad0b08e338/dvc/fs/dvc.py#L165 Looks like the comparison is not removing the extra part. os.curdir is the local folder . if i tested right.

Hope this helps.

Thanks and grate job!

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.34.0 (pip)
---------------------------------
Platform: Python 3.8.5 on macOS-11.6.5-x86_64-i386-64bit
Subprojects:
        dvc_data = 0.25.3
        dvc_objects = 0.12.2
        dvc_render = 0.0.12
        dvc_task = 0.1.4
        dvclive = 1.0.1
        scmrepo = 0.1.3
Supports:
        azure (adlfs = 2022.10.0, knack = 0.9.0, azure-identity = 1.7.1),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: azure
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Reactions:1
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
giannipablocommented, Nov 17, 2022

@dberenbaum PR submitted! Please take a look at the fix just to be sure. I tested it and the dvc.api.get_url solved the url properly.

1reaction
dberenbaumcommented, Nov 16, 2022

Looks like the comparison is not removing the extra part. os.curdir is the local folder . if i tested right.

By the way, I think this is intentional (sometimes parts[0] == "."), so we should not get rid of this logic, but instead add another check for something like parts[0] == self.path.split(os.getcwd())[-1]. cc @skshetry

Read more comments on GitHub >

github_iconTop Results From Across the Web

dvc.api.get_url : adds unnecessary "/" at the beginning of ...
When using dvc.api.get_url , if args ( repo or rev ) are provided, it returns an relative path to the remote storage user...
Read more >
"dvc.api.get_url()" is not working for --external outputs
I have added dvc remote external output to track (external cache and external data storage), Following is the .dvc generated, path in github: ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found