_DVCFileSystem returns extra part on relative path. Impact on dvc.api.get_url on r2.34.0
See original GitHub issueBug Report
Description
I’ve updated from release 2.8.3 to 2.34.0 and dvc.api.get_url crashes.
Reproduce
Repo structure:
repo_root_dir
|_ common
|_ other_deps
|_ other.bin
|_ module_a
|_ dependencies
|_ dep_1.bin
|_ dep_2.bin
|_ my_app.py
|_ setup.py
|_ module_b
|
.
.
.
| _module_n
During the building, located at module_a folder, I resolve some binaries dependencies URLs based on path using
dvc.api.get_url('module_a/dependencies/dep_1.bin')
under release 2.8.3 it worked ok but now with release 2.34.0 i got the following error
>>> FileNotFoundError: [Errno 2] No such file or directory: '/Users/myuser/projects/repo_root_dir/module_a/module_a/dependencies/dep_1.bin'
Notice that the module_a part is repeated in the path arisen from the error. Following the code I was able to find that
$ pwd
/Users/myuser/projects/repo_root_dir/module_a/
$ python
>>> from dvc.fs.dvc import _DVCFileSystem
>>> path = module_a/dependencies/dep_1.bin'
>>> dvc_file_system = _DVCFileSystem()
>>> dvc_file_system.path.relparts(path, '/')
Out:
( 'module_a',
'module_a',
'dependencies',
'dep_1.bin' )
The same but in the repo root directory
$ pwd
/Users/myuser/projects/repo_root_dir/
$ python
>>> from dvc.fs.dvc import _DVCFileSystem
>>> path = module_a/dependencies/dep_1.bin'
>>> dvc_file_system = _DVCFileSystem()
>>> dvc_file_system.path.relparts(path, '/')
Out:
( 'module_a',
'dependencies',
'dep_1.bin' )
Lookin the code
Looking at this line
https://github.com/iterative/dvc/blob/a6bfc7db4ed3a86c58b4bfac3b22f5ad0b08e338/dvc/fs/dvc.py#L165
Looks like the comparison is not removing the extra part. os.curdir
is the local folder .
if i tested right.
Hope this helps.
Thanks and grate job!
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.34.0 (pip)
---------------------------------
Platform: Python 3.8.5 on macOS-11.6.5-x86_64-i386-64bit
Subprojects:
dvc_data = 0.25.3
dvc_objects = 0.12.2
dvc_render = 0.0.12
dvc_task = 0.1.4
dvclive = 1.0.1
scmrepo = 0.1.3
Supports:
azure (adlfs = 2022.10.0, knack = 0.9.0, azure-identity = 1.7.1),
http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.8.3),
https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: azure
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc, git
Issue Analytics
- State:
- Created 10 months ago
- Reactions:1
- Comments:7 (2 by maintainers)
Top Results From Across the Web
dvc.api.get_url : adds unnecessary "/" at the beginning of ...
When using dvc.api.get_url , if args ( repo or rev ) are provided, it returns an relative path to the remote storage user...
Read more >"dvc.api.get_url()" is not working for --external outputs
I have added dvc remote external output to track (external cache and external data storage), Following is the .dvc generated, path in github: ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@dberenbaum PR submitted! Please take a look at the fix just to be sure. I tested it and the dvc.api.get_url solved the url properly.
By the way, I think this is intentional (sometimes
parts[0] == "."
), so we should not get rid of this logic, but instead add another check for something likeparts[0] == self.path.split(os.getcwd())[-1]
. cc @skshetry