question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`dvc queue`: unexpected behaviour

See original GitHub issue

Bug Report

Description

Whilst checking out the new dvc queue command I have run into some unexpected behaviour. I won’t duplicate the steps to reproduce here but after queueing and running experiments I have run in to two different issues.

VS Code demo project: dvc queue status returning ERROR: Invalid experiment '{entry.stash_rev[:7]}'. (produced when running with the extension) example-get-started: dvc queue status returning

Task     Name    Created    Status
f3d69ee          02:17 PM   Success
08ccb05          02:17 PM   Success

ERROR: unexpected error - Extra data: line 1 column 56 (char 55)

(produced without having the extension involved).

In both instances this resulted in the HEAD baseline entry being dropped from the exp show data:

example-get-started example
❯ dvc exp show --show-json
{
  "workspace": {
    "baseline": {
      "data": {
        "timestamp": null,
        "params": {
          "params.yaml": {
            "data": {
              "prepare": {
                "split": 0.21,
                "seed": 20170428
              },
              "featurize": {
                "max_features": 200,
                "ngrams": 2
              },
              "train": {
                "seed": 20170428,
                "n_est": 50,
                "min_split": 0.01
              }
            }
          }
        },
        "deps": {
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null
          },
          "src/prepare.py": {
            "hash": "f09ea0c15980b43010257ccb9f0055e2",
            "size": 1576,
            "nfiles": null
          },
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2
          },
          "src/featurization.py": {
            "hash": "e0265fc22f056a4b86d85c3056bc2894",
            "size": 2490,
            "nfiles": null
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2
          },
          "src/train.py": {
            "hash": "c3961d777cfbd7727f9fde4851896006",
            "size": 967,
            "nfiles": null
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null
          },
          "src/evaluate.py": {
            "hash": "44e714021a65edf881b1716e791d7f59",
            "size": 2346,
            "nfiles": null
          }
        },
        "outs": {
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": false
          },
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": true
          }
        },
        "queued": false,
        "running": false,
        "executor": null,
        "metrics": {
          "evaluation.json": {
            "data": {
              "avg_prec": 0.9249974999612706,
              "roc_auc": 0.9460213440787918
            }
          }
        }
      }
    }
  },
  "f3d69eedda6b1c051b115523cf5c6c210490d0ea": {
    "baseline": {
      "data": {
        "timestamp": "2022-07-13T14:17:20",
        "params": {
          "params.yaml": {
            "data": {
              "prepare": {
                "split": 0.21,
                "seed": 20170428
              },
              "featurize": {
                "max_features": 200,
                "ngrams": 2
              },
              "train": {
                "seed": 20170428,
                "n_est": 50,
                "min_split": 0.01
              }
            }
          }
        },
        "deps": {
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null
          },
          "src/prepare.py": {
            "hash": "f09ea0c15980b43010257ccb9f0055e2",
            "size": 1576,
            "nfiles": null
          },
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2
          },
          "src/featurization.py": {
            "hash": "e0265fc22f056a4b86d85c3056bc2894",
            "size": 2490,
            "nfiles": null
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2
          },
          "src/train.py": {
            "hash": "c3961d777cfbd7727f9fde4851896006",
            "size": 967,
            "nfiles": null
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null
          },
          "src/evaluate.py": {
            "hash": "44e714021a65edf881b1716e791d7f59",
            "size": 2346,
            "nfiles": null
          }
        },
        "outs": {
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": false
          },
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": true
          }
        },
        "queued": false,
        "running": false,
        "executor": null,
        "metrics": {
          "evaluation.json": {
            "data": {
              "avg_prec": 0.9249974999612706,
              "roc_auc": 0.9460213440787918
            }
          }
        }
      }
    }
  }
}

Reproduce

  1. clone example-get-started
  2. add git+https://github.com/iterative/dvc to src/requirements.txt
  3. create venv, source activate script and install requirements
  4. dvc pull
  5. change params.yaml and queue x2 with dvc exp run --queue
  6. dvc queue start -j 2
  7. dvc exp show
  8. dvc queue status
  9. dvc exp show

When recreating this I can see that both experiments were successful in dvc queue status but the second one has not made it into the table. Final results:

❯ dvc queue status 
Task     Name    Created    Status
9d22751          02:50 PM   Success
962c834          02:50 PM   Success

Worker status: 0 active, 0 idle

First column of exp show:

  workspace
  bigrams-experiment
  └── 65584bd [exp-c88e8]

and the shas don’t match?

Expected

Should be able to run exp show & queue status in parallel with the execution of tasks from the queue.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.13.1.dev87+gc2668110 
---------------------------------
Platform: Python 3.8.9 on macOS-12.2.1-arm64-arm-64bit
Supports:
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git

Additional Information (if any):

Please let me know if you need anything else from me. Thank you.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:18 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
karajan1001commented, Jul 28, 2022

Sound like related to https://github.com/iterative/dvc-task/issues/73. I tried several times but didn’t meet this. I guess it is not related to the experiments in old versions. And related to 1. concurrency 2. checkpoint. I can repair the error message ‘{entry.stash_rev[:7]}’ first to see what stash_rev value it is.

1reaction
mattseddoncommented, Jul 27, 2022

Tl;dr - I can recreate the issue by using dvc queue start -j 2. As j > 1 is currently experimental we can probably close this.

I was unable to reproduce this one, and it’s unclear whether it should be a priority.

I can definitely recreate it. I just ran into it again:

image

When trying to clean up experiments after getting that warning:

❯ dvc exp gc -f --all-tags 
WARNING: This will remove all experiments except those derived from the workspace and all git tags of the current repo. Run queued experiments will be removed.
ERROR: Invalid experiment '{entry.stash_rev[:7]}'.

This will be an issue in the extension because of errors generate a popup that the user sees.

Deleting .dvc/tmp/exps gets rid of the error altogether.

Repro steps:

  1. Using checkpoint-based experiments with 2.11.0
  2. Run an experiment in the workspace.
  3. Upgrade to 2.15.0.
  4. Queue two experiments with different params.
  5. dvc queue start -j 2
  6. run dvc exp show --show-json almost immediately after starting the queue (as per extension).
  7. One experiment will run, the other will disappear.
  8. dvc queue status returns ERROR: Invalid experiment '{entry.stash_rev[:7]}'.

Even these repro steps are a bit hit or miss. From 3 attempts I hit the error and with a missing experiment 2 times.

I can also recreate just by using steps 4-8 (no upgrade needed).

The error is probably caused by 5+6. As j > 1 is a known issue we can probably close this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

queue | Data Version Control - DVC
dvc queue provides an interface to process and manage these queued tasks. See this guide for more information on experiment queues. Options.
Read more >
Why Everyone is Saying They're Done With Disney World
Some people are very frustrated with Disney World right now. See here why some guests say they're "done" with the Most Magical Place...
Read more >
Package List — Spack 0.20.0.dev0 documentation
libnetfilter-queue, py-pacifica-uploader, simple-dftd3 ... makes it easy to trace, monitor and test the behaviour of Java application and JDK runtime code.
Read more >
DVC Webpage Using Virtual Queue for Login | DVCNews.com
Within the last several days, the DVC members-only website has a. ... website has been routinely plagued by unexpected downtime.
Read more >
Tuning Hyperparameters with Reproducible Experiments
We'll do this by creating queues. A queue is how DVC allows us to create experiments that won't be run until later.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found