question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`add`: `.gitignore` is modified when directory-only pattern is already present

See original GitHub issue

Bug Report

Description

dvc add modifies .gitignore when run on a directory, even when .gitignore already contains a directory-only pattern matching said directory.

Reproduce

  1. mkdir bug && cd bug
  2. git init && dvc init
  3. mkdir data
  4. echo '/data/' 1>.gitignore
  5. dvc add data/
  6. cat .gitignore

Output is:

/data/
/data

Expected

If .gitignore already contains a pattern ignoring the directory, dvc add should not modify the file. This works fine for files, but not for directory-only patterns in .gitignore.

Environment information

Installed via the deb file on Ubuntu 18.04.

$ dvc doctor
DVC version: 2.10.2 (deb)
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-110-generic-x86_64-with-glibc2.14
Supports:
        azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
        gdrive (pydrive2 = 1.10.1),
        gs (gcsfs = 2022.3.0),
        hdfs (fsspec = 2022.3.0, pyarrow = 7.0.0),
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
        ssh (sshfs = 2022.3.1),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.7),
        webdavs (webdav4 = 0.9.7)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda2
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sda2
Repo: dvc, git
$ dvc add --verbose data/
2022-05-26 11:58:19,640 WARNING: 'data' is empty.
2022-05-26 11:58:19,641 DEBUG: Added '/home/user/bug/data' to gitignore file.
2022-05-26 11:58:19,644 DEBUG: staged tree 'object md5: d751713988987e9331980363e24189ce.dir'
2022-05-26 11:58:19,645 DEBUG: state save (1082445, 99914b932bd37a50b983c5e7c90ae93b, 0) d751713988987e9331980363e24189ce.dir
2022-05-26 11:58:19,648 DEBUG: Computed stage: 'data.dvc' md5: 'None'
2022-05-26 11:58:19,650 DEBUG: Preparing to transfer data from 'memory://dvc-staging/1630ef59ac6c2487dd30e7be446f9be77f70435117f7c2da7b0995baff079d5d' to '/home/user/bug/.dvc/cache'
2022-05-26 11:58:19,650 DEBUG: Preparing to collect status from '/home/user/bug/.dvc/cache'
2022-05-26 11:58:19,651 DEBUG: Collecting status from '/home/user/bug/.dvc/cache'
2022-05-26 11:58:19,651 DEBUG: Preparing to collect status from 'memory://dvc-staging/1630ef59ac6c2487dd30e7be446f9be77f70435117f7c2da7b0995baff079d5d'
2022-05-26 11:58:19,654 DEBUG: state save (1080120, 1653577099652015360, 2) d751713988987e9331980363e24189ce.dir
2022-05-26 11:58:19,655 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
  File "dvc/fs/utils.py", line 28, in _link
  File "dvc/fs/base.py", line 263, in reflink
  File "dvc/fs/local.py", line 156, in reflink
  File "dvc/system.py", line 112, in reflink
  File "dvc/system.py", line 96, in _reflink_linux
OSError: [Errno 95] Operation not supported

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "dvc/fs/utils.py", line 69, in _try_links
  File "dvc/fs/utils.py", line 32, in _link
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "dvc/fs/utils.py", line 124, in _test_link
  File "dvc/fs/utils.py", line 77, in _try_links
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-05-26 11:58:19,656 DEBUG: Removing '/home/user/bug/.fEAYfjTwkgEVFTNGw7Y5iD.tmp'
2022-05-26 11:58:19,656 DEBUG: Uploading '/home/user/bug/.dvc/cache/.DTYbyq7XpBk7uzKbcTakmS.tmp' to '/home/user/bug/.fEAYfjTwkgEVFTNGw7Y5iD.tmp'
2022-05-26 11:58:19,657 DEBUG: Removing '/home/user/bug/.fEAYfjTwkgEVFTNGw7Y5iD.tmp'
2022-05-26 11:58:19,657 DEBUG: Removing '/home/user/bug/.dvc/cache/.DTYbyq7XpBk7uzKbcTakmS.tmp'
2022-05-26 11:58:19,664 DEBUG: state save (1082445, 99914b932bd37a50b983c5e7c90ae93b, 0) d751713988987e9331980363e24189ce.dir
2022-05-26 11:58:19,665 DEBUG: Saving information to 'data.dvc'.
100% Adding...|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ|1/1 [00:00, 28.56file/s]

To track the changes with git, run:

    git add data.dvc .gitignore

To enable auto staging, run:

        dvc config core.autostage true
2022-05-26 11:58:19,680 DEBUG: Analytics is enabled.
2022-05-26 11:58:19,680 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp354ecba1']'
2022-05-26 11:58:19,681 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp354ecba1']'

Additional Information (if any):

By reading the source of this function, I believe this is either a bug in the git backend, or a problem in the way the path argument is formatted.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
thinetycommented, May 27, 2022

@dberenbaum Usually, data/ is inside another directory in the project (dir/ for example), and is ignored by the root .gitignore. When running dvc add dir/data/, another .gitignore is created inside dir/, because the root .gitignore contains /dir/data/ and not /dir/data. I need to delete this new .gitignore on every run of dvc add, which is more of an incovenience than a problem. The root file could also be modified, but in general I prefer the explicitness of the pattern /dir/data/ matching only directories.

1reaction
dtrifirocommented, May 27, 2022

I can reproduce with the following code:

import os
from tempfile import TemporaryDirectory

from dulwich.ignore import IgnoreFilter

with TemporaryDirectory() as tmp_dir:
    os.chdir(tmp_dir)

    for name in ["/absolute", "/absolutetrailing/", "relative", "relativetrailing/"]:

        with open(".gitignore", "w") as fh:
            fh.write(name + "\n")

        print(".gitignore contents:")
        with open(".gitignore") as fh:
            print(fh.read(), end="")

        ignore_filter = IgnoreFilter.from_path(".gitignore")
        name = name.removeprefix("/").removesuffix("/")
        is_ignored = ignore_filter.is_ignored(name.encode("UTF-8"))
        print(f"{name=}, {is_ignored=}\n")


.gitignore contents:
/absolute
name='absolute', is_ignored=True

 .gitignore contents:
/absolutetrailing/
name='absolutetrailing', is_ignored=None

 .gitignore contents:
relative
name='relative', is_ignored=True

 .gitignore contents:
relativetrailing/
name='relativetrailing', is_ignored=None
Read more comments on GitHub >

github_iconTop Results From Across the Web

git still shows files as modified after adding to .gitignore
Your .gitignore is working, but it still tracks the files because they were already in the index. To stop this you have to...
Read more >
.gitignore file - ignoring files in Git | Atlassian Git Tutorial
Git ignore patterns are used to exclude certain files in your working directory from your Git history. They can be local, global, or...
Read more >
Mariot Tsitoara - Beginning Git and GitHub A Comprehensive ...
The main takeaways are the three states that a file can be: β€’ Modified: You modified ... git add .gitignore $ git commit...
Read more >
gitignore Documentation - Git
Files already tracked by Git are not affected; see the NOTES below for details. Each line in a gitignore file specifies a pattern....
Read more >
How can I ignore a file that has already been committed to a ...
That's why, when you create a new repository, you should also create a .gitignore file with all the file patterns you want to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found