question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Environments and multi-task persistent state

See original GitHub issue

I frequently use GPU clusters, and for these workloads I find that I need to do some relatively expensive setup or teardown operations to establish a valid GPU computing context on each worker before I start the distributed processing of my data. There are various ways to “trick” distributed workers into maintaining a stateful execution environment for multiple jobs, probably the best of which is to use global variable in the Python interpreter itself. However, this solution (and others I’m aware of) do not provide particularly fine grained control over the distributed execution environment. It would be very useful if distributed could provide some way to explicitly manage execution “environments” in which jobs could be explicitly run.

I’m imagining something along these lines. I would call

my_gpu_env = executor.environment(my_setup_function(), my_teardown_function())

which would cause each worker to run my_setup_function() and then return a reference to its copy of the new environment. The scheduler would record these and send back a single reference to the collection of environments on the workers, my_new_env in this example. Subsequent calls to run distributed functions could specify the environment in which they would like to be run:

executor.map( some_function_using_the_gpu, data, environment = my_gpu_env )

The scheduler could keep track of the setup() and teardown() functions associated with each environment. Then, if a new worker comes online and is asked to run a function in an environment that it has not yet set up, it could request the necessary initialization routine from the scheduler and run that first before running any jobs.

This is a somewhat rough sketch of what would be desirable here, and I’m curious to start a discussion here to see if there are other users out there that might also want a feature like this. In particular, are there others using distributed to manage a cluster of GPU nodes? How do you manage a cluster-wide execution context?

Issue Analytics

  • State:open
  • Created 8 years ago
  • Comments:28 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Apr 16, 2016

If you want to use graphs explicitly you would need to modify your graph to point to the "data" key rather than to the object itself.

Before

dask = {
   "data": data,
   "task1": (fn, data, arg1),
   "task2": (fn, data, arg2),
   ...
   "result": ["task1", "task2", ...]
}

After

dask = {
   "data": data,
   "task1": (fn, "data", arg1),
   "task2": (fn, "data", arg2),
   ...
   "result": ["task1", "task2", ...]
}

But really I would just do the following:

[data_future] = e.scatter([data])
tasks = [e.submit(fn, data_future, arg) for arg in args]
results = e.gather(tasks)
0reactions
olly-writes-codecommented, Aug 15, 2019

@thompson42 - how did you solve this for your use case?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Benchmark Environments for Multitask Learning in ... - arXiv
Abstract. As demand drives systems to generalize to var- ious domains and problems, the study of multi- task, transfer and lifelong learning ...
Read more >
Environments and Baseline for Multitask Reinforcement ...
"Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning." Conference on Robot Learning. PMLR, 2020. [3] Zhang, Amy, et al. " ......
Read more >
Multitask Reinforcement Learning in Nondeterministic ...
An MDP is defined as a 4-tuple < S,A,T,R > characterized as follows: S is a set of states in environment, A is...
Read more >
Avoiding Catastrophe: Active Dendrites Enable Multi-Task ...
A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task...
Read more >
A Survey of Multi-Task Deep Reinforcement Learning - MDPI
During multi-task learning, a set of closely related tasks will be learned ... With environments with a 3D nature, the size of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found