question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`exp run`: git ignored files committed to experiments

See original GitHub issue

Bug Report

Description

I was working on a new command in the extension where you can “share an experiment”. I got some surprising results in that the experiment’s input data that is tracked by DVC and listed in .gitignore files was committed into Git. I discovered this when I sent the branch to the remote and opened a PR.

Reproduce

  1. run several checkpoint experiments in the workspace.
  2. pick one of the experiments.
  3. dvc exp branch [exp-name] [branch-name]
  4. dvc exp apply [exp-name]
  5. dvc push
  6. git push origin [branch-name]
  7. create a PR for the branch pushed to origin. Tracked data will be shown in the PR. e.g https://github.com/iterative/vscode-dvc/pull/2156

Expected

exp branch does not commit DVC tracked/git ignored data.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.16.0 (pip)
---------------------------------
Platform: Python 3.8.9 on macOS-12.5-arm64-arm-64bit
Supports:
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        s3 (s3fs = 2022.5.0, boto3 = 1.21.21)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git

Additional Information (if any):

Could definitely be related to the way a branch is created for checkpoint experiments. I have not tested to see if the same problem occurs with non-checkpoint experiments.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:2
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
pmrowlacommented, Aug 18, 2022

I can reproduce it, the issue here is that the demo stage dependency is

    deps:
    - data/MNIST

but the DVC (data) output is data/MNIST/raw, so we still end up explicitly git-adding data/MNIST without doing any further gitignore checks.

If I change the pipeline stage to use

    deps:
    - data/MNIST/raw

I get the expected behavior, with no DVC-tracked data being added in git.


I think for the demo project the pipeline stage dependency should be data/MNIST/raw, since the dependency is supposed to be on that data directory (data/MNIST/raw/) and not the data directory plus anything else in data/MNIST (which currently includes raw.dvc and .gitignore).

We do still have a bug in how exps handle combined git + DVC directory deps, but it probably needs to be addressed in scmrepo (and I don’t think needs to be p0). The issue is that scm.add() has always been git add --force, but we are definitely at the point where we need a proper distinction between scm.add(force=True) and force=False.

e: opened https://github.com/iterative/scmrepo/issues/123

2reactions
pmrowlacommented, Aug 9, 2022

This issue was also noted by a discord user, but it was unclear on how to reproduce it and not followed up into a separate bug report

https://discord.com/channels/485586884165107732/485596304961962003/1003959217826300025

Read more comments on GitHub >

github_iconTop Results From Across the Web

exp run | Data Version Control - DVC
Provides a way to execute and track experimentsexperiments in your projectproject without polluting it with unnecessary commits, branches, directories, etc.
Read more >
How to fix .gitignore not working on your repository
If all goes well, run git commit after to re-commit your files. $ git commit -m "Commit with ignored files properly ignored."
Read more >
How to ignore certain files in Git - Stack Overflow
The problem is that .gitignore ignores just files that weren't tracked before (by git add ). Run git reset name_of_file to unstage the...
Read more >
How can I ignore a file that has already been committed to a ...
Git can only ignore files that are untracked - files that haven't been committed to the repository, yet. That's why, when you create...
Read more >
Ignoring Files and Folders in Git - GeeksforGeeks
So to ignore a file in git we use a .gitignore file inside that file you can mention whether you want a particular...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found