question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Launcher: Local parallel sweep]

See original GitHub issue

🚀 Feature Request

Is it doable to execute sweeps locally in parallel (with a variable like ntasks_per_node in config.yaml)? Additionally, being able to specify a list of gpu indices and run the sweep in parallel on those (relying on the env var CUDA_VISIBLE_DEVICES) could be useful.

Motivation

Is your feature request related to a problem? Please describe. The motivation is that without a slurm system configured, a simple parallel launcher would allow easier and faster computations than executing several times the same command.

Pitch

Describe the solution you’d like A launcher object allowing a sweep in parallel.

Describe alternatives you’ve considered I tried adapting the BasicLauncher using joblib but did not succeed so far. One issue being that I get a Invalid plugin error because my launcher class does start neither with hydra_plugins nor with hydra._internal.core_plugins..

Additional context

Thanks fo the great library! 😃

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
omrycommented, Feb 12, 2020

The new plugin is published. It supports Hydra 1.0.0 which is not yet released. you can try it by checkout out Hydra from master and installing the plugin with pip install.

Plugin website page.

1reaction
emilemathieutmpcommented, Jan 8, 2020

Thanks @omry for reaching back 😃

I eventually succeeded in implementing a parallel Launcher based on joblib (can be found here)

I still have a few issues:

  • How to avoid the Invalid plugin error because my launcher class does start neither with hydra_plugins nor with hydra._internal.core_plugins. ?
  • I don’t know why but the OmegaConf cannot resolve an inter_type now at some point which I hard fixed with something like value = datetime.datetime.now().strftime(inter_key).

If you have any hint that’d be great! 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Running COMSOL® in parallel on clusters - Knowledge Base
The Cluster Computing node or the Cluster Sweep node can be used in your study to configure and launch the parallel execution on...
Read more >
Multi-run | Hydra
Sometimes you want to run a parameter sweep. ... Hydra comes with a simple launcher that runs the jobs locally and serially. Other...
Read more >
HPC: Parametric Sweep variations don't run in parallel
I am trying to get my cluster to run many variations of a parametric sweep in parallel. I can get different designs within...
Read more >
Parallel Simulations Using Parsim: Parameter Sweep in ...
[20-Feb-2021 23:49:38] Checking for availability of parallel pool... Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel ...
Read more >
Tune Distributed Experiments — Ray 2.2.0
Running a distributed (multi-node) experiment requires Ray to be started already. You can do this on local machines or on the cloud. Across...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found