question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

exp run: python: can't open file 'src/train.py'

See original GitHub issue

Bug Report

Description

I added a couple of experiments with different parameters to queue. I wanted to run them sequentially with dvc exp run --run-all but I got some unexpected results:

  1. I realize dvc checkout is runned automatically for each experiment. I think this is bad since I have to wait all my images and annotations were β€œcheckout” once per experiment. I did not know why this was happening until I read this line in the source code: https://github.com/iterative/dvc/blob/master/dvc/command/experiments.py#L1293 That implies --temp means that β€œeach exp is runned in a separate temporary directory instead of your workspace”. I guess this is related with parallelizing different runs… but I consider it should be possible to avoid it since it does not scale with large datasets.

  2. The error I am pointing in the issue name python: can't open file 'src/train.py' is appearing because of --temp parameters that forces to commit in both dvc and git the src/train.py so file is present later in the tmp copy (this is only my current belief, I could not confirm this yet) Everything works fine with dvc repro.

Reproduce

  1. dvc init
  2. import and add dataset
  3. create src/train.py script
  4. dvc exp run --queue
  5. dvc exp run --queue -S optimizer.lr=0.1
  6. dvc exp run --run-all

Expected

I expected that my experiment runned in sequential order without duplicating my data in another tmp directory.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 2.5.4 (conda)
---------------------------------
Platform: Python 3.8.10 on Linux-4.15.0-96-generic-x86_64-with-glibc2.10
Supports:
        gdrive (pydrive2 = 1.9.1),
        http (requests = 2.26.0),
        https (requests = 2.26.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb1
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sdb1
Repo: dvc, git

Additional Information (if any):

$ dvc exp show 
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓               
┃ Experiment   ┃ Created      ┃ test_split.split ┃ optimizer.lr ┃
┑━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
β”‚ workspace    β”‚ -            β”‚ -                β”‚ -            β”‚
β”‚ main         β”‚ Aug 19, 2021 β”‚ -                β”‚ -            β”‚
β”‚ β”œβ”€β”€ *098b5bc β”‚ 01:07 PM     β”‚ 0.1              β”‚ 0.1          β”‚
β”‚ └── *6980253 β”‚ 01:06 PM     β”‚ -                β”‚ -            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
daavoocommented, Aug 25, 2021
1reaction
pmrowlacommented, Aug 23, 2021

This is expected behavior since src/train.py is not tracked by git - from the exp run docs (https://dvc.org/doc/command-reference/exp/run#queueing-and-parallel-execution):

⚠️ Note that only tracked files and directories will be included in --queue/temp experiments. To include untracked files, stage them with git add first (before dvc exp run).

As you noted, the --queue/--run-all functionality is limited right now, and only supports running experiments in their own separate tempdir workspaces. This this behavior is also documented:

Use dvc exp run --run-all to process the queue. This is done outside your workspace (in temporary dirs in .dvc/tmp/exps) to preserve any changes between/after queueing runs.

To run an experiment in your main repo working directory, currently you cannot use the --queue/--run-all functionality. But if you just do

dvc exp run -S optimizer.lr=0.1

it will run in your workspace the same way as dvc repro (and it will work properly even with an untracked src/train.py file).

Read more comments on GitHub >

github_iconTop Results From Across the Web

python3: can't open file 'train.py': [Errno 2] No such ... - GitHub
I haven't been able to get model_main.py to work correctly yet (I run in to errors related to pycocotools). Fortunately, the train.py file...
Read more >
Can't get attribute 'my_func' on <module '__main__' from 'main ...
Problem. Why I get that error is bc the Tf() function was used to train the model.pkl file, in the same namespace (because...
Read more >
commit | Data Version Control - DVC
Open -source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
Read more >
Tutorial: ML pipelines for training - Azure Machine Learning
Start an interactive Python session. This tutorial uses the Python SDK for Azure ML to create and control an Azure Machine Learning pipelineΒ ......
Read more >
How to Fix "python: can't open file 'manage.py' - Followchain
In this guide, you'll learn how to fix "python: can't open file 'manage.py': [Errno 2] No such ... Step 2: Run β€œpython manage.py...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found