question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pull --glob: never matches

See original GitHub issue

Bug Report

Description

Descpite having matching stages dvc pull with --glob option never finds any matches.

Reproduce

I can not share my code, and I don’t think preparing a toy example is needed here.

Expected

Pull all the outputs of stages matching the expression.

Environment information

Output of dvc doctor:

DVC version: 2.20.0 (pip)
---------------------------------
Platform: Python 3.9.5 on Linux-5.4.0-124-generic-x86_64-with-glibc2.31
Supports:
	azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
	s3 (s3fs = 2022.5.0, boto3 = 1.21.21)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-home--vg
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-home--vg
Repo: dvc, git

Additional Information (if any): I am not sure if dvc doctor outputs correct information about cache types. In my config I have "reflink,copy", while above we can see Cache types: hardlink, symlink. This looks to me like another bug.

Kind regards, macio232

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
jorgeorpinelcommented, Sep 20, 2022

_Answer from https://github.com/iterative/dvc.org/pull/3933#issuecomment-1247903567_:

for the other three commands their target is all local… I don’t think dvc add --glob (for example) would miss any file that didn’t exist in the workspace.

1reaction
karajan1001commented, Aug 26, 2022

I guess this is the same problem mentioned in https://github.com/iterative/dvc/issues/6671#issuecomment-925468495

The issue is just that the existing implementation of --glob is very naive - it can only apply glob patterns to files which already exist in the local workspace. It does not support globbing against outputs within the repo tree (that do not already exist in the workspace).

Basically --glob is currently only useful for updating some subset of previously checked out or pulled data, or for pushing some subset of the existing data in your workspace.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dvc pull --glob - Questions - Community Forum - DVC
The command dvc pull --glob data/**/*.txt throws a lot of errors, because dvc tries to download files from the remote that I never...
Read more >
glob on non existant file with pattern matching - Stack Overflow
To test is a file of a given name exist, the easiest is to use os.path.isfile . But if you just have the...
Read more >
glob - npm
Match files using the patterns the shell uses, like stars and stuff. Build Status Coverage Status. This is a glob implementation in JavaScript....
Read more >
refs/tags/1.1.6 - glob - Google Git
glob is a file and directory globbing library that supports both checking whether a path matches a glob and listing all entities that...
Read more >
Returning randomised items from glob match
Use a random sort key (glob qualifier oe ):: *(Noe\''REPLY=$RANDOM,$RANDOM'\'). Explanation: oe is followed by a one-character delimiter, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found