`add`: `.gitignore` is modified when directory-only pattern is already present
See original GitHub issueBug Report
Description
dvc add
modifies .gitignore
when run on a directory, even when .gitignore
already contains a directory-only pattern matching said directory.
Reproduce
mkdir bug && cd bug
git init && dvc init
mkdir data
echo '/data/' 1>.gitignore
dvc add data/
cat .gitignore
Output is:
/data/
/data
Expected
If .gitignore
already contains a pattern ignoring the directory, dvc add
should not modify the file. This works fine for files, but not for directory-only patterns in .gitignore
.
Environment information
Installed via the deb
file on Ubuntu 18.04.
$ dvc doctor
DVC version: 2.10.2 (deb)
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-110-generic-x86_64-with-glibc2.14
Supports:
azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
gdrive (pydrive2 = 1.10.1),
gs (gcsfs = 2022.3.0),
hdfs (fsspec = 2022.3.0, pyarrow = 7.0.0),
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
ssh (sshfs = 2022.3.1),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.7),
webdavs (webdav4 = 0.9.7)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda2
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sda2
Repo: dvc, git
$ dvc add --verbose data/
2022-05-26 11:58:19,640 WARNING: 'data' is empty.
2022-05-26 11:58:19,641 DEBUG: Added '/home/user/bug/data' to gitignore file.
2022-05-26 11:58:19,644 DEBUG: staged tree 'object md5: d751713988987e9331980363e24189ce.dir'
2022-05-26 11:58:19,645 DEBUG: state save (1082445, 99914b932bd37a50b983c5e7c90ae93b, 0) d751713988987e9331980363e24189ce.dir
2022-05-26 11:58:19,648 DEBUG: Computed stage: 'data.dvc' md5: 'None'
2022-05-26 11:58:19,650 DEBUG: Preparing to transfer data from 'memory://dvc-staging/1630ef59ac6c2487dd30e7be446f9be77f70435117f7c2da7b0995baff079d5d' to '/home/user/bug/.dvc/cache'
2022-05-26 11:58:19,650 DEBUG: Preparing to collect status from '/home/user/bug/.dvc/cache'
2022-05-26 11:58:19,651 DEBUG: Collecting status from '/home/user/bug/.dvc/cache'
2022-05-26 11:58:19,651 DEBUG: Preparing to collect status from 'memory://dvc-staging/1630ef59ac6c2487dd30e7be446f9be77f70435117f7c2da7b0995baff079d5d'
2022-05-26 11:58:19,654 DEBUG: state save (1080120, 1653577099652015360, 2) d751713988987e9331980363e24189ce.dir
2022-05-26 11:58:19,655 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
File "dvc/fs/utils.py", line 28, in _link
File "dvc/fs/base.py", line 263, in reflink
File "dvc/fs/local.py", line 156, in reflink
File "dvc/system.py", line 112, in reflink
File "dvc/system.py", line 96, in _reflink_linux
OSError: [Errno 95] Operation not supported
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/fs/utils.py", line 69, in _try_links
File "dvc/fs/utils.py", line 32, in _link
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/fs/utils.py", line 124, in _test_link
File "dvc/fs/utils.py", line 77, in _try_links
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-05-26 11:58:19,656 DEBUG: Removing '/home/user/bug/.fEAYfjTwkgEVFTNGw7Y5iD.tmp'
2022-05-26 11:58:19,656 DEBUG: Uploading '/home/user/bug/.dvc/cache/.DTYbyq7XpBk7uzKbcTakmS.tmp' to '/home/user/bug/.fEAYfjTwkgEVFTNGw7Y5iD.tmp'
2022-05-26 11:58:19,657 DEBUG: Removing '/home/user/bug/.fEAYfjTwkgEVFTNGw7Y5iD.tmp'
2022-05-26 11:58:19,657 DEBUG: Removing '/home/user/bug/.dvc/cache/.DTYbyq7XpBk7uzKbcTakmS.tmp'
2022-05-26 11:58:19,664 DEBUG: state save (1082445, 99914b932bd37a50b983c5e7c90ae93b, 0) d751713988987e9331980363e24189ce.dir
2022-05-26 11:58:19,665 DEBUG: Saving information to 'data.dvc'.
100% Adding...|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ|1/1 [00:00, 28.56file/s]
To track the changes with git, run:
git add data.dvc .gitignore
To enable auto staging, run:
dvc config core.autostage true
2022-05-26 11:58:19,680 DEBUG: Analytics is enabled.
2022-05-26 11:58:19,680 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp354ecba1']'
2022-05-26 11:58:19,681 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp354ecba1']'
Additional Information (if any):
By reading the source of this function, I believe this is either a bug in the git backend, or a problem in the way the path
argument is formatted.
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (4 by maintainers)
Top Results From Across the Web
git still shows files as modified after adding to .gitignore
Your .gitignore is working, but it still tracks the files because they were already in the index. To stop this you have to...
Read more >.gitignore file - ignoring files in Git | Atlassian Git Tutorial
Git ignore patterns are used to exclude certain files in your working directory from your Git history. They can be local, global, or...
Read more >Mariot Tsitoara - Beginning Git and GitHub A Comprehensive ...
The main takeaways are the three states that a file can be: β’ Modified: You modified ... git add .gitignore $ git commit...
Read more >gitignore Documentation - Git
Files already tracked by Git are not affected; see the NOTES below for details. Each line in a gitignore file specifies a pattern....
Read more >How can I ignore a file that has already been committed to a ...
That's why, when you create a new repository, you should also create a .gitignore file with all the file patterns you want to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@dberenbaum Usually,
data/
is inside another directory in the project (dir/
for example), and is ignored by the root.gitignore
. When runningdvc add dir/data/
, another.gitignore
is created insidedir/
, because the root.gitignore
contains/dir/data/
and not/dir/data
. I need to delete this new.gitignore
on every run ofdvc add
, which is more of an incovenience than a problem. The root file could also be modified, but in general I prefer the explicitness of the pattern/dir/data/
matching only directories.I can reproduce with the following code: