`exp run`: git ignored files committed to experiments
See original GitHub issueBug Report
Description
I was working on a new command in the extension where you can “share an experiment”. I got some surprising results in that the experiment’s input data that is tracked by DVC and listed in .gitignore files was committed into Git. I discovered this when I sent the branch to the remote and opened a PR.
Reproduce
- run several checkpoint experiments in the workspace.
- pick one of the experiments.
- dvc exp branch [exp-name] [branch-name]
- dvc exp apply [exp-name]
- dvc push
- git push origin [branch-name]
- create a PR for the branch pushed to origin. Tracked data will be shown in the PR. e.g https://github.com/iterative/vscode-dvc/pull/2156
Expected
exp branch
does not commit DVC tracked/git ignored data.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.16.0 (pip)
---------------------------------
Platform: Python 3.8.9 on macOS-12.5-arm64-arm-64bit
Supports:
webhdfs (fsspec = 2022.5.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
s3 (s3fs = 2022.5.0, boto3 = 1.21.21)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git
Additional Information (if any):
Could definitely be related to the way a branch is created for checkpoint experiments. I have not tested to see if the same problem occurs with non-checkpoint experiments.
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:9 (3 by maintainers)
Top Results From Across the Web
exp run | Data Version Control - DVC
Provides a way to execute and track experimentsexperiments in your projectproject without polluting it with unnecessary commits, branches, directories, etc.
Read more >How to fix .gitignore not working on your repository
If all goes well, run git commit after to re-commit your files. $ git commit -m "Commit with ignored files properly ignored."
Read more >How to ignore certain files in Git - Stack Overflow
The problem is that .gitignore ignores just files that weren't tracked before (by git add ). Run git reset name_of_file to unstage the...
Read more >How can I ignore a file that has already been committed to a ...
Git can only ignore files that are untracked - files that haven't been committed to the repository, yet. That's why, when you create...
Read more >Ignoring Files and Folders in Git - GeeksforGeeks
So to ignore a file in git we use a .gitignore file inside that file you can mention whether you want a particular...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can reproduce it, the issue here is that the demo stage dependency is
but the DVC (data) output is
data/MNIST/raw
, so we still end up explicitly git-addingdata/MNIST
without doing any further gitignore checks.If I change the pipeline stage to use
I get the expected behavior, with no DVC-tracked data being added in git.
I think for the demo project the pipeline stage dependency should be
data/MNIST/raw
, since the dependency is supposed to be on that data directory (data/MNIST/raw/
) and not the data directory plus anything else indata/MNIST
(which currently includesraw.dvc
and.gitignore
).We do still have a bug in how exps handle combined git + DVC directory deps, but it probably needs to be addressed in scmrepo (and I don’t think needs to be p0). The issue is that
scm.add()
has always beengit add --force
, but we are definitely at the point where we need a proper distinction betweenscm.add(force=True)
andforce=False
.e: opened https://github.com/iterative/scmrepo/issues/123
This issue was also noted by a discord user, but it was unclear on how to reproduce it and not followed up into a separate bug report
https://discord.com/channels/485586884165107732/485596304961962003/1003959217826300025