Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`dvc queue`: unexpected behaviour

See original GitHub issue

Bug Report

Description

Whilst checking out the new dvc queue command I have run into some unexpected behaviour. I won’t duplicate the steps to reproduce here but after queueing and running experiments I have run in to two different issues.

VS Code demo project: dvc queue status returning ERROR: Invalid experiment '{entry.stash_rev[:7]}'. (produced when running with the extension) example-get-started: dvc queue status returning

Task     Name    Created    Status
f3d69ee          02:17 PM   Success
08ccb05          02:17 PM   Success

ERROR: unexpected error - Extra data: line 1 column 56 (char 55)

(produced without having the extension involved).

In both instances this resulted in the HEAD baseline entry being dropped from the exp show data:

example-get-started example

❯ dvc exp show --show-json
{
  "workspace": {
    "baseline": {
      "data": {
        "timestamp": null,
        "params": {
          "params.yaml": {
            "data": {
              "prepare": {
                "split": 0.21,
                "seed": 20170428
              },
              "featurize": {
                "max_features": 200,
                "ngrams": 2
              },
              "train": {
                "seed": 20170428,
                "n_est": 50,
                "min_split": 0.01
              }
            }
          }
        },
        "deps": {
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null
          },
          "src/prepare.py": {
            "hash": "f09ea0c15980b43010257ccb9f0055e2",
            "size": 1576,
            "nfiles": null
          },
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2
          },
          "src/featurization.py": {
            "hash": "e0265fc22f056a4b86d85c3056bc2894",
            "size": 2490,
            "nfiles": null
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2
          },
          "src/train.py": {
            "hash": "c3961d777cfbd7727f9fde4851896006",
            "size": 967,
            "nfiles": null
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null
          },
          "src/evaluate.py": {
            "hash": "44e714021a65edf881b1716e791d7f59",
            "size": 2346,
            "nfiles": null
          }
        },
        "outs": {
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": false
          },
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": true
          }
        },
        "queued": false,
        "running": false,
        "executor": null,
        "metrics": {
          "evaluation.json": {
            "data": {
              "avg_prec": 0.9249974999612706,
              "roc_auc": 0.9460213440787918
            }
          }
        }
      }
    }
  },
  "f3d69eedda6b1c051b115523cf5c6c210490d0ea": {
    "baseline": {
      "data": {
        "timestamp": "2022-07-13T14:17:20",
        "params": {
          "params.yaml": {
            "data": {
              "prepare": {
                "split": 0.21,
                "seed": 20170428
              },
              "featurize": {
                "max_features": 200,
                "ngrams": 2
              },
              "train": {
                "seed": 20170428,
                "n_est": 50,
                "min_split": 0.01
              }
            }
          }
        },
        "deps": {
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null
          },
          "src/prepare.py": {
            "hash": "f09ea0c15980b43010257ccb9f0055e2",
            "size": 1576,
            "nfiles": null
          },
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2
          },
          "src/featurization.py": {
            "hash": "e0265fc22f056a4b86d85c3056bc2894",
            "size": 2490,
            "nfiles": null
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2
          },
          "src/train.py": {
            "hash": "c3961d777cfbd7727f9fde4851896006",
            "size": 967,
            "nfiles": null
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null
          },
          "src/evaluate.py": {
            "hash": "44e714021a65edf881b1716e791d7f59",
            "size": 2346,
            "nfiles": null
          }
        },
        "outs": {
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": false
          },
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": true
          }
        },
        "queued": false,
        "running": false,
        "executor": null,
        "metrics": {
          "evaluation.json": {
            "data": {
              "avg_prec": 0.9249974999612706,
              "roc_auc": 0.9460213440787918
            }
          }
        }
      }
    }
  }
}

Reproduce

clone example-get-started
add git+https://github.com/iterative/dvc to src/requirements.txt
create venv, source activate script and install requirements
dvc pull
change params.yaml and queue x2 with dvc exp run --queue
dvc queue start -j 2
dvc exp show
dvc queue status
dvc exp show

When recreating this I can see that both experiments were successful in dvc queue status but the second one has not made it into the table. Final results:

❯ dvc queue status 
Task     Name    Created    Status
9d22751          02:50 PM   Success
962c834          02:50 PM   Success

Worker status: 0 active, 0 idle

First column of exp show:

  workspace
  bigrams-experiment
  └── 65584bd [exp-c88e8]

and the shas don’t match?

Expected

Should be able to run exp show & queue status in parallel with the execution of tasks from the queue.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.13.1.dev87+gc2668110 
---------------------------------
Platform: Python 3.8.9 on macOS-12.2.1-arm64-arm-64bit
Supports:
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git

Additional Information (if any):

Please let me know if you need anything else from me. Thank you.

Issue Analytics

State:
Created a year ago
Comments:18 (12 by maintainers)

Top GitHub Comments

2reactions

karajan1001commented, Jul 28, 2022

Sound like related to https://github.com/iterative/dvc-task/issues/73. I tried several times but didn’t meet this. I guess it is not related to the experiments in old versions. And related to 1. concurrency 2. checkpoint. I can repair the error message ‘{entry.stash_rev[:7]}’ first to see what stash_rev value it is.

1reaction

mattseddoncommented, Jul 27, 2022

Tl;dr - I can recreate the issue by using dvc queue start -j 2. As j > 1 is currently experimental we can probably close this.

I was unable to reproduce this one, and it’s unclear whether it should be a priority.

I can definitely recreate it. I just ran into it again:

When trying to clean up experiments after getting that warning:

❯ dvc exp gc -f --all-tags 
WARNING: This will remove all experiments except those derived from the workspace and all git tags of the current repo. Run queued experiments will be removed.
ERROR: Invalid experiment '{entry.stash_rev[:7]}'.

This will be an issue in the extension because of errors generate a popup that the user sees.

Deleting .dvc/tmp/exps gets rid of the error altogether.

Repro steps:

Using checkpoint-based experiments with 2.11.0
Run an experiment in the workspace.
Upgrade to 2.15.0.
Queue two experiments with different params.
dvc queue start -j 2
run dvc exp show --show-json almost immediately after starting the queue (as per extension).
One experiment will run, the other will disappear.
dvc queue status returns ERROR: Invalid experiment '{entry.stash_rev[:7]}'.

Even these repro steps are a bit hit or miss. From 3 attempts I hit the error and with a missing experiment 2 times.

I can also recreate just by using steps 4-8 (no upgrade needed).

The error is probably caused by 5+6. As j > 1 is a known issue we can probably close this.

Top Results From Across the Web

queue | Data Version Control - DVC

dvc queue provides an interface to process and manage these queued tasks. See this guide for more information on experiment queues. Options.

Why Everyone is Saying They're Done With Disney World

Some people are very frustrated with Disney World right now. See here why some guests say they're "done" with the Most Magical Place...

Package List — Spack 0.20.0.dev0 documentation

libnetfilter-queue, py-pacifica-uploader, simple-dftd3 ... makes it easy to trace, monitor and test the behaviour of Java application and JDK runtime code.

DVC Webpage Using Virtual Queue for Login | DVCNews.com

Within the last several days, the DVC members-only website has a. ... website has been routinely plagued by unexpected downtime.

Tuning Hyperparameters with Reproducible Experiments

We'll do this by creating queues. A queue is how DVC allows us to create experiments that won't be run until later.