question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dvc and git does not behave the same with "!" and "**"

See original GitHub issue

Consider the following project structure

  • data
    • data1
      • file1
      • file1.dvc
    • data2
      • file2
      • file2.dvc
  • .gitignore

.gitignore is as follows:

data/**
!data/*/
!*.dvc

git status gives: image

while dvc push gives: image

I expect to git and dvc behave the same with gitignore.

  • dvc: 2.8.3
  • python: 3.7

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:16

github_iconTop GitHub Comments

2reactions
Danial-Alhcommented, Nov 20, 2021

Whether it is was a bug or a bug fix, some commits reverted, and a test case added πŸ˜ƒ

By the way, I think there is a separate issue with dvc.

In our testcase, if we re-include β€˜data1’ directory by !data/*/, dvc ignores .dvc files inside data1; but if it is done by !data/**/, dvc behaves as expected.

In either of cases, .dvc files inside data1 directory are not ignored by git and the check-ignore output is as follows:

$ git check-ignore -v data/data1/file1.dvc
.gitignore:3:!/data/**/*.dvc   data/data1/file1.dvc

I used another git version, 2.17.1.

1reaction
karajan1001commented, Nov 25, 2021

Sorry for late reply.

in our testcase, if we re-include β€˜data1’ directory by !data/*/, dvc ignores .dvc files inside data1 I add some debug code to dvc and tried two examples, In dvc push:

$ dvc push
/Users/gao/Code/test/ignore/.dvc/config.local ignore status is True
/Users/gao/Code/test/ignore/.dvc/tmp ignore status is True
/Users/gao/Code/test/ignore/.dvc/cache ignore status is True
/Users/gao/Code/test/ignore/data/ ignore status is True
Everything is up to date.

While in dvc add data/data2/b

$ dvc add data/data2/b
/Users/gao/Code/test/ignore/.dvc/config.local ignore status is True
/Users/gao/Code/test/ignore/.dvc/tmp ignore status is True
/Users/gao/Code/test/ignore/.dvc/cache ignore status is True
/Users/gao/Code/test/ignore/data/data2/b.dvc ignore status is False
/Users/gao/Code/test/ignore/data/ ignore status is True
Adding...                                                                                                                                                                                                           /Users/gao/Code/test/ignore/data/data2/b ignore status is True
/Users/gao/Code/test/ignore/data/data2/b.dvc ignore status is False
100% Adding...|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ|1/1 [00:00, 31.69file/s]

To track the changes with git, run:

	git add data/data2/b.dvc

To enable auto staging, run:

	dvc config core.autostage true

And if we change !data/*/ to !data/**/

$ dvc push
/Users/gao/Code/test/ignore/.dvc/config.local ignore status is True
/Users/gao/Code/test/ignore/.dvc/tmp ignore status is True
/Users/gao/Code/test/ignore/.dvc/cache ignore status is True
/Users/gao/Code/test/ignore/data/ ignore status is False
/Users/gao/Code/test/ignore/data/data1/ ignore status is False
/Users/gao/Code/test/ignore/data/data2/ ignore status is False
/Users/gao/Code/test/ignore/data/data1/c.dvc ignore status is False
/Users/gao/Code/test/ignore/data/data1/c.dvc ignore status is False
/Users/gao/Code/test/ignore/data/data1/c.dvc ignore status is False
/Users/gao/Code/test/ignore/data/data1/a.dvc ignore status is False
/Users/gao/Code/test/ignore/data/data1/a.dvc ignore status is False
/Users/gao/Code/test/ignore/data/data1/a.dvc ignore status is False
/Users/gao/Code/test/ignore/data/data2/b.dvc ignore status is False
/Users/gao/Code/test/ignore/data/data2/b.dvc ignore status is False
/Users/gao/Code/test/ignore/data/data2/b.dvc ignore status is False
3 files pushed

So I guess there are two problems:

  1. Our backends Dulwich gives the different results with
# with `!data/*/`
$ dulwich check-ignore data/
data/
# with `!data/**/`
$ dulwich check-ignore data/

While for the Git:

# with `!data/*/`
$ git check-ignore data/
$
# with `!data/**/`
$ git check-ignore data/
$

They give the same result.

  1. DVC has a different logic in different commands (add work properly while push and commit are not)

And for the logic of gitignore, the following from the thread is quite clear I think

  • Git opens and reads the working tree directory. For each file or directory that is actually present here, Git checks it against the ignore rules. Some rules match only directories and others match both directories and files. Some rules say β€œdo ignore” and some say β€œdo not ignore”.

  • The last applicable rule wins.

  • If this is a file and the file is ignored, it’s ignored. Unless, that is, it’s in the index already, because then it’s tracked and can’t be ignored.

  • If this is a directory and the directory is ignored, it’s not even opened and read. It’s not in the index because directories are never in the index (at least nominally). If it is opened and read, the entire set of rules here apply recursively.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dvc exp show: experiment not showing / wrong position
Using dvc exp branch just creates a new git branch that contains the contents of the experiment. It does not move/re-associate the experiment...
Read more >
checkout: file replacing behaviour and unnecessary user ...
@dmpetrov dvc checkout is the same as git checkout (which does not behave the way you described). dvc checkout is not the same...
Read more >
Is the default DVC behavior to store connection data in git?
1 Answer 1 Β· --local - repository level, ignored by git by default - designated for project-scope, sensitive data Β· project - same...
Read more >
Data Version Control With Python and DVC - Real Python
It's not easy to keep track of all the data you use for experiments and the ... While Git is used to store...
Read more >
Data and Machine Learning Model Versioning with DVC
However, Git is not the best solution to solve the ... The files with .dvc extension are text files that will act as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found