question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make queued experiments initialization progress run in parallel.

See original GitHub issue

For now, when we run queued experiments we would lock the whole scm during the whole initialization progress.

https://github.com/iterative/dvc/blob/29b3dc1f9ec96fbd4116f8adabaeb17c9633afb2/dvc/repo/experiments/queue/base.py#L566-L568.

This makes the initialization progress only run sequentially. We can do a more precise control on this progress to make the

  1. We should only lock some of the exp refs or an exp reference namespace.
  2. We should only add the lock on a few steps instead of on the whole progress.
  3. We do not need to set EXEC_* branches to the before pushing them to the remote.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
karajan1001commented, Nov 23, 2022

The queue stashes are different for different workers

Is this the way it’s supposed to work? I would think they would consume tasks from the same queue.

The queue itself is thread safety, but no one can guarantee that the stash queue is too. For example, If I use git stash list first to find the desired stash, and then want to git stash pop stash@{n} it. But in the interval time, some other processes did some operation on the stash queue.

1reaction
karajan1001commented, Nov 22, 2022

I’m starting from the command line using that repo. Are you seeing slowdown during exp run --queue or during queue start?

It is not about slowdown but about the experiments running in sequence even if I started several workers. The reason for this is that we use an SCM lock during the initialization progress. And because 1. the workers are only reading each of the stashes, 2. The queue stashes are different for different workers. So we can make some modifications and then release the lock, and make the workers run in parallel.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multiple dvc runs in parallel - Questions - Community Forum
Hello, I am trying to run multiple dvc experiments simultaneously on a gpu cluster. The way the cluster works is that you send...
Read more >
dvc exp ps : Experiment executor/process management #7002
In the event that the user has queued multiple runs and then uses --run-all , it's also essentially impossible to distinguish what is...
Read more >
Python Stacks, Queues, and Priority Queues in Practice
The counter gets initialized when you create a new PriorityQueue instance. Whenever you enqueue a value, the counter increments and retains its ...
Read more >
Configuring the Degree of Parallelism - Hangfire Documentation
When you start the background job server, it initializes the pool and starts the fixed amount of workers. You can specify their number...
Read more >
OMNeT++ - Simulation Manual - Index of - omnetpp.org
16.3 Parallel Distributed Simulation Support in OMNeT++ ... to change (or make sense to be changed) during experimentation should be put into ini...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found