question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

exp apply: not working for failed experiments

See original GitHub issue

Bug Report

Description

When an experiment fails, and I want to change something and re-run it, I first need to apply it, make changes and add the new version to the queue. Unfortunately, dvc exp apply <hash> on a failed experiment has the following result:

2022-08-25 09:47:00,888 ERROR: '20ea06d' does not appear to be an experiment commit.: Experiment derived from 'celeryf', expected '3b0d8e3'.
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/apply.py", line 38, in apply
    exps.check_baseline(exp_rev)
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 378, in check_baseline
    raise BaselineMismatchError(exp_baseline, baseline_sha)
dvc.repo.experiments.exceptions.BaselineMismatchError: Experiment derived from 'celeryf', expected '3b0d8e3'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/commands/experiments/apply.py", line 14, in run
    self.repo.experiments.apply(
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 499, in apply
    return apply(self.repo, *args, **kwargs)
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 156, in run
    return method(repo, *args, **kw)
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/apply.py", line 40, in apply
    raise InvalidExpRevError(rev) from exc
dvc.repo.experiments.exceptions.InvalidExpRevError: '20ea06d' does not appear to be an experiment commit.
------------------------------------------------------------
2022-08-25 09:47:00,891 DEBUG: Analytics is enabled.
2022-08-25 09:47:00,917 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpmqf25q63']'
2022-08-25 09:47:00,919 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpmqf25q63']

Reproduce

I can not share my code, and I don’t think preparing a toy example is needed here.

Expected

Changes to code/configuration files are applied in the workspace as they have been scheduled for execution.

Environment information

Output of dvc doctor:

DVC version: 2.18.1 (pip)
---------------------------------
Platform: Python 3.9.5 on Linux-5.4.0-124-generic-x86_64-with-glibc2.31
Supports:
        azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        s3 (s3fs = 2022.5.0, boto3 = 1.21.21),
        webhdfs (fsspec = 2022.5.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-home--vg
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-home--vg
Repo: dvc, git

Kind regards, macio232

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

4reactions
karajan1001commented, Aug 25, 2022

For now, there is a little bit hacky method to check out it. You can try it with

git checkout $(cat .git/refs/exps/celery/failed)

2reactions
pmrowlacommented, Aug 25, 2022

exp apply needs to check the failed refs stash now. This usage of apply worked before the celery changes (since failed exps were just re-added to the regular queue), so this should be considered a regression (and it’s a simple fix)

https://github.com/iterative/dvc/blob/063eb6904dc79c2e5be9e1b57f7ecaa781eded8b/dvc/repo/experiments/apply.py#L42

this just needs to be something like

stash_rev = exp_rev in exps.stash_revs or exp_rev in exps.celery_queue.failed_stash.stash_revs

(apply doesn’t pop from the stash, we just need to check that the git SHA exists in one of our stashes)

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Troubleshoot Experiments that Just Aren't Working
When you do hit a plateau in moving an experiment in the lab forward, here are actionable steps you can take to turn...
Read more >
How to Deal With a Failed Experiment - Bitesize Bio
When dealing with a failed experiment, one of the best things you can do is take a break. You might be tempted to...
Read more >
Experiments for AWS FIS - AWS Fault Injection Simulator
You cannot resume a stopped or failed experiment. You also cannot rerun a completed experiment. However, you can start a new experiment from...
Read more >
Hypothesis Trouble: What to do when a science project fails
If, after carefully reviewing the science project, you have reason to believe there was a problem (an error in the experiment or in...
Read more >
PhD tips – Dealing with “failed” experiments - Elisa Granato
Treatment repeatedly has no effect compared to control · Noise / differences between independent biological replicates very high · Mistake made in experiment...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found