question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`dvc list -R`: listing contents of data registry fails, when using recursive flag

See original GitHub issue

Bug Report

dvc list -R: listing contents of data registry fails, when using recursive flag

I setup a sample data registry containing the data generating code using a dvc.yaml pipeline. When trying to list the registry’s content, dvc list works as intended and shows the top-level files and dirs of the repo. When using dvc list -R it fails with a TreeError. This seems to be similar to issue #7871 and a comment regarding TreeError can be found in the Discord channel as well.

Description

dvc list works as intended

$ dvc list -vv https://github.com/hfrechen/data-registry-test
2022-08-09 13:29:13,399 TRACE: Namespace(cprofile=False, yappi=False, viztracer=False, viztracer_depth=None, cprofile_dump=None, pdb=False, instrument=False, instrument_open=False, quiet=0, verbose=2, version=None, cd='.', cmd='list', url='https://github.com/hfrechen/data-registry-test', recursive=False, dvc_only=False, json=False, rev=None, path=None, func=<class 'dvc.commands.ls.CmdList'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2022-08-09 13:29:13,651 DEBUG: Creating external repo https://github.com/hfrechen/data-registry-test@None
2022-08-09 13:29:13,651 DEBUG: erepo: git clone 'https://github.com/hfrechen/data-registry-test' to a temporary dir
2022-08-09 13:29:14,895 TRACE: Context during resolution of stage data_import:               
{'data': {'path': {'raw': './data/raw', 'interim': './data/interim'}}}
2022-08-09 13:29:14,898 TRACE:    50.41 ms in collecting stages from /
2022-08-09 13:29:14,898 TRACE:     2.26 mks in collecting stages from /.dvc
2022-08-09 13:29:14,899 TRACE:     6.07 mks in collecting stages from /data
2022-08-09 13:29:14,899 TRACE:     4.54 mks in collecting stages from /data/interim
2022-08-09 13:29:14,899 TRACE:     3.82 mks in collecting stages from /data/raw
2022-08-09 13:29:14,899 TRACE:     3.96 mks in collecting stages from /src
.dvcignore
.gitignore
README.md
data
dvc.lock
dvc.yaml
params.yaml
src
2022-08-09 13:29:14,905 DEBUG: Analytics is enabled.
2022-08-09 13:29:14,958 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpai6phy17']'
2022-08-09 13:29:14,960 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpai6phy17']'

dvc list -R fails with TreeError

$ dvc list -R -vv https://github.com/hfrechen/data-registry-test
2022-08-09 13:29:37,067 TRACE: Namespace(cprofile=False, yappi=False, viztracer=False, viztracer_depth=None, cprofile_dump=None, pdb=False, instrument=False, instrument_open=False, quiet=0, verbose=2, version=None, cd='.', cmd='list', url='https://github.com/hfrechen/data-registry-test', recursive=True, dvc_only=False, json=False, rev=None, path=None, func=<class 'dvc.commands.ls.CmdList'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2022-08-09 13:29:37,180 DEBUG: Creating external repo https://github.com/hfrechen/data-registry-test@None
2022-08-09 13:29:37,181 DEBUG: erepo: git clone 'https://github.com/hfrechen/data-registry-test' to a temporary dir
2022-08-09 13:29:38,134 TRACE: Context during resolution of stage data_import:               
{'data': {'path': {'raw': './data/raw', 'interim': './data/interim'}}}
2022-08-09 13:29:38,137 TRACE:    32.35 ms in collecting stages from /
2022-08-09 13:29:38,137 TRACE:     2.09 mks in collecting stages from /.dvc
2022-08-09 13:29:38,137 TRACE:     6.68 mks in collecting stages from /data
2022-08-09 13:29:38,138 TRACE:     4.53 mks in collecting stages from /data/interim
2022-08-09 13:29:38,138 TRACE:     3.66 mks in collecting stages from /data/raw
2022-08-09 13:29:38,138 TRACE:     3.51 mks in collecting stages from /src
2022-08-09 13:29:38,156 ERROR: unexpected error
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/cli/command.py", line 36, in do_run
    return self.run()
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/commands/ls/__init__.py", line 31, in run
    entries = Repo.ls(
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/repo/ls.py", line 46, in ls
    ret = _ls(repo, path, recursive, dvc_only)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/repo/ls.py", line 68, in _ls
    for root, dirs, files in fs.walk(
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/fsspec/spec.py", line 421, in walk
    yield from self.walk(d, maxdepth=maxdepth, detail=detail, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/fsspec/spec.py", line 421, in walk
    yield from self.walk(d, maxdepth=maxdepth, detail=detail, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/fsspec/spec.py", line 389, in walk
    listing = self.ls(path, detail=True, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/fs/dvc.py", line 335, in ls
    for entry in dvc_fs.ls(dvc_path, detail=False):
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_objects/fs/base.py", line 318, in ls
    return self.fs.ls(path, detail=detail)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_data/fs.py", line 82, in ls
    for name in self.index.ls(prefix=root_key)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_data/index.py", line 130, in ls
    raise TreeError
dvc_data.objects.tree.TreeError
------------------------------------------------------------
2022-08-09 13:29:38,528 DEBUG: Version info for developers:
DVC version: 2.17.0 (conda)
---------------------------------
Platform: Python 3.9.13 on Linux-3.10.0-1160.45.1.el7.x86_64-x86_64-with-glibc2.31
Supports:
        hdfs (fsspec = 2022.7.1, pyarrow = 8.0.1),
        webhdfs (fsspec = 2022.7.1),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.7.0),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.7.0)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: xfs on /dev/mapper/centos-root
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-08-09 13:29:38,529 DEBUG: Analytics is enabled.
2022-08-09 13:29:38,591 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp59k200wt']'
2022-08-09 13:29:38,595 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp59k200wt']'

This seems to affect dvc import as well:

$ dvc import -vv https://github.com/hfrechen/data-registry-test -o data/interim data/interim    
2022-08-09 13:30:18,116 TRACE: Namespace(cprofile=False, yappi=False, viztracer=False, viztracer_depth=None, cprofile_dump=None, pdb=False, instrument=False, instrument_open=False, quiet=0, verbose=2, version=None, cd='.', cmd='import', url='https://github.com/hfrechen/data-registry-test', path='data/interim', out='data/interim', rev=None, file=None, no_exec=False, desc=None, jobs=None, func=<class 'dvc.commands.imp.CmdImport'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2022-08-09 13:30:18,402 TRACE:    66.82 mks in collecting stages from /home/dev/projects/data-consumer
2022-08-09 13:30:18,402 TRACE:    73.89 mks in collecting stages from /home/dev/projects/data-consumer/data
2022-08-09 13:30:18,402 TRACE:     1.97 mks in collecting stages from /home/dev/projects/data-consumer/data/interim
2022-08-09 13:30:18,677 DEBUG: Removing output 'data/interim/interim' of stage: 'data/interim/interim.dvc'.
2022-08-09 13:30:18,677 DEBUG: Removing '/home/dev/projects/data-consumer/data/interim/interim'
Importing 'data/interim (https://github.com/hfrechen/data-registry-test)' -> 'data/interim/interim'
2022-08-09 13:30:18,678 DEBUG: Computed stage: 'data/interim/interim.dvc' md5: '1e3c54ddb027f31952ea5d2c65f3ed8e'
2022-08-09 13:30:18,678 DEBUG: 'md5' of stage: 'data/interim/interim.dvc' changed.
2022-08-09 13:30:18,679 DEBUG: Creating external repo https://github.com/hfrechen/data-registry-test@None
2022-08-09 13:30:18,679 DEBUG: erepo: git clone 'https://github.com/hfrechen/data-registry-test' to a temporary dir
2022-08-09 13:30:19,651 DEBUG: Checking if stage '/data/interim' is in 'dvc.yaml'            
2022-08-09 13:30:19,687 TRACE: Context during resolution of stage data_import:
{'data': {'path': {'raw': './data/raw', 'interim': './data/interim'}}}
2022-08-09 13:30:19,690 TRACE:    37.28 ms in collecting stages from /
2022-08-09 13:30:19,690 TRACE:     2.45 mks in collecting stages from /.dvc
2022-08-09 13:30:19,690 TRACE:     7.00 mks in collecting stages from /data
2022-08-09 13:30:19,690 TRACE:     5.64 mks in collecting stages from /data/interim
2022-08-09 13:30:19,690 TRACE:     3.54 mks in collecting stages from /data/raw
2022-08-09 13:30:19,691 TRACE:     3.58 mks in collecting stages from /src
2022-08-09 13:30:19,697 ERROR: failed to import 'data/interim' from 'https://github.com/hfrechen/data-registry-test'. - The path 'data/interim' does not exist in the target repository 'https://github.com/hfrechen/data-registry-test' neither as a DVC output nor as a Git-tracked file.: 
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/dependency/repo.py", line 134, in _get_used_and_obj
    object_store, _, obj = build(
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_data/build.py", line 245, in build
    meta, obj = _build_tree(
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_data/build.py", line 123, in _build_tree
    for root, _, fnames in walk_iter:
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/fsspec/spec.py", line 389, in walk
    listing = self.ls(path, detail=True, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/fs/dvc.py", line 335, in ls
    for entry in dvc_fs.ls(dvc_path, detail=False):
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_objects/fs/base.py", line 318, in ls
    return self.fs.ls(path, detail=detail)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_data/fs.py", line 82, in ls
    for name in self.index.ls(prefix=root_key)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_data/index.py", line 130, in ls
    raise TreeError
dvc_data.objects.tree.TreeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/commands/imp.py", line 15, in run
    self.repo.imp(
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/repo/imp.py", line 6, in imp
    return self.imp_url(
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 156, in run
    return method(repo, *args, **kw)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/repo/imp_url.py", line 83, in imp_url
    stage.run(jobs=jobs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/stage/decorators.py", line 38, in rwlocked
    return call()
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/stage/__init__.py", line 535, in run
    self._sync_import(dry, force, jobs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/stage/decorators.py", line 38, in rwlocked
    return call()
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/stage/__init__.py", line 559, in _sync_import
    sync_import(self, dry, force, jobs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/stage/imports.py", line 43, in sync_import
    stage.deps[0].download(stage.outs[0], jobs=jobs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/dependency/repo.py", line 68, in download
    for odb, objs in self.get_used_objs().items():
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/dependency/repo.py", line 97, in get_used_objs
    used, _ = self._get_used_and_obj(**kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/dependency/repo.py", line 141, in _get_used_and_obj
    raise PathMissingError(
dvc.exceptions.PathMissingError: The path 'data/interim' does not exist in the target repository 'https://github.com/hfrechen/data-registry-test' neither as a DVC output nor as a Git-tracked file.
------------------------------------------------------------
2022-08-09 13:30:19,706 DEBUG: Analytics is enabled.
2022-08-09 13:30:19,914 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp3p8liv51']'
2022-08-09 13:30:19,917 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp3p8liv51']'

Reproduce

I created a sample repo https://github.com/hfrechen/data-registry-test to try it out. Either command fails for me on different combinations of Ubuntu and Windows clients for DVC 2.15, 2.16 and 2.17.

  1. dvc list -vv https://github.com/hfrechen/data-registry-test
  2. dvc list -R -vv https://github.com/hfrechen/data-registry-test
  3. dvc import https://github.com/hfrechen/data-registry-test -o data/interim data/interim -vv

Expected

dvc list -R should be listing all subdirectories and files contained in the data registry

Environment information

For me could be reproduced using a clean conda environment just with conda create -n dvc -c conda-forge dvc

Output of dvc doctor:

$ dvc doctor
DVC version: 2.17.0 (conda)
---------------------------------
Platform: Python 3.9.13 on Linux-3.10.0-1160.45.1.el7.x86_64-x86_64-with-glibc2.31
Supports:
        hdfs (fsspec = 2022.7.1, pyarrow = 8.0.1),
        webhdfs (fsspec = 2022.7.1),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.7.0),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.7.0)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: xfs on /dev/mapper/centos-root
Repo: dvc, git

Additional Information (if any):

I tried to debug and tracked it down to this line https://github.com/iterative/dvc-data/blob/main/src/dvc_data/index.py#L138 State of the local objects, entry.obj seems to be None:

(Pdb) prefix
('data', 'interim')
(Pdb) self._trie
Trie([(('data', 'interim'), DataIndexEntry(meta=<dvc_data.hashfile.meta.Meta object at 0x7f7e7b30a840>, obj=None, hash_info=<dvc_data.hashfile.hash_info.HashInfo object at 0x7f7e7b23c0c0>, odb=<dvc_data.db.local.LocalHashFileDB object at 0x7f7e7b2cf3a0>, remote=None, loaded=None))])
(Pdb) entry
DataIndexEntry(meta=<dvc_data.hashfile.meta.Meta object at 0x7f7e7b30a840>, obj=None, hash_info=<dvc_data.hashfile.hash_info.HashInfo object at 0x7f7e7b23c0c0>, odb=<dvc_data.db.local.LocalHashFileDB object at 0x7f7e7b2cf3a0>, remote=None, loaded=None)
(Pdb) entry.obj
(Pdb) entry.obj is None
True

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
dberenbaumcommented, Aug 12, 2022

It doesn’t look like there is any default remote set in https://github.com/hfrechen/data-registry-test, without which DVC can’t granularly list the contents of directories it’s tracking or import data. In DVC<=2.9.4, I get the error dvc.exceptions.NoRemoteInExternalRepoError: No DVC remote is specified in target repository 'https://github.com/hfrechen/data-registry-test'.. It’s probably best to keep showing an error like this.

1reaction
hfrechencommented, Aug 23, 2022

Just for your information. The TreeError appeared another time for me, again with the not so meaningful error message

2022-08-23 16:06:53,279 ERROR: unexpected error
------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/cli/command.py", line 36, in do_run
    return self.run()
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/commands/ls/__init__.py", line 31, in run
    entries = Repo.ls(
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/repo/ls.py", line 46, in ls
    ret = _ls(repo, path, recursive, dvc_only)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/repo/ls.py", line 68, in _ls
    for root, dirs, files in fs.walk(
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/fsspec/spec.py", line 421, in walk
    yield from self.walk(d, maxdepth=maxdepth, detail=detail, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/fsspec/spec.py", line 421, in walk
    yield from self.walk(d, maxdepth=maxdepth, detail=detail, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/fsspec/spec.py", line 421, in walk
    yield from self.walk(d, maxdepth=maxdepth, detail=detail, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/fsspec/spec.py", line 389, in walk
    listing = self.ls(path, detail=True, **kwargs)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc/fs/dvc.py", line 335, in ls
    for entry in dvc_fs.ls(dvc_path, detail=False):
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_objects/fs/base.py", line 318, in ls
    return self.fs.ls(path, detail=detail)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_data/fs.py", line 82, in ls
    for name in self.index.ls(prefix=root_key)
  File "/opt/conda/envs/feature-store/lib/python3.9/site-packages/dvc_data/index.py", line 138, in ls
    raise TreeError
dvc_data.objects.tree.TreeError

Took me a while to figure out, what the reason was, because the default remote was set.

  1. I created another stage in my pipeline with new dependencies and new outputs.
  2. I ran dvc repro which was executed successfully
  3. I commited and pushed the changes to the git repo
  4. I forgot to dvc push and tried to import the new stage of the data registry in another repo. dvc list -R failed with TreeError, due to the non-existant new stage in the DVC remote

Maybe you could improve the error messages here as well. Something like this? Issue 1: “Default remote not set. Please configure in .dvc/config” Issue 2: “Files tracked in your registry could not be found on remote”

Read more comments on GitHub >

github_iconTop Results From Across the Web

pull | Data Version Control - DVC
-R , --recursive - determines the files to pull by searching each target directory and its subdirectories for dvc.yaml and .dvc files to...
Read more >
How to fix DVC error 'FileNotFoundError: [Errno 2] No such file ...
Trying to pull a folder with test data into a GitHub actions container, I get. FileNotFoundError: [Errno 2] No such file or directory....
Read more >
Verifiable Registries with Efficient Client Audits from RSA ...
A nat- ural starting point to build client-auditable verifiable reg- istries is to use incrementally verifiable computation (IVC). [Val08] via recursive proofs ...
Read more >
Intel® VTune™ Profiler User Guide
Temporary Directory for Performance Results on Linux* Targets .. 123. Embedded Linux* Targets. ... View Energy Analysis Data with Intel® VTune™ Profiler ....
Read more >
The DataLad Handbook
B.21 Help - Why does Github display my dataset with git-annex as the default branch ... using data – regardless of the data's...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found