Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Environments and multi-task persistent state

See original GitHub issue

I frequently use GPU clusters, and for these workloads I find that I need to do some relatively expensive setup or teardown operations to establish a valid GPU computing context on each worker before I start the distributed processing of my data. There are various ways to “trick” distributed workers into maintaining a stateful execution environment for multiple jobs, probably the best of which is to use global variable in the Python interpreter itself. However, this solution (and others I’m aware of) do not provide particularly fine grained control over the distributed execution environment. It would be very useful if distributed could provide some way to explicitly manage execution “environments” in which jobs could be explicitly run.

I’m imagining something along these lines. I would call

my_gpu_env = executor.environment(my_setup_function(), my_teardown_function())

which would cause each worker to run my_setup_function() and then return a reference to its copy of the new environment. The scheduler would record these and send back a single reference to the collection of environments on the workers, my_new_env in this example. Subsequent calls to run distributed functions could specify the environment in which they would like to be run:

executor.map( some_function_using_the_gpu, data, environment = my_gpu_env )

The scheduler could keep track of the setup() and teardown() functions associated with each environment. Then, if a new worker comes online and is asked to run a function in an environment that it has not yet set up, it could request the necessary initialization routine from the scheduler and run that first before running any jobs.

This is a somewhat rough sketch of what would be desirable here, and I’m curious to start a discussion here to see if there are other users out there that might also want a feature like this. In particular, are there others using distributed to manage a cluster of GPU nodes? How do you manage a cluster-wide execution context?

Issue Analytics

State:
Created 8 years ago
Comments:28 (19 by maintainers)

Top GitHub Comments

1reaction

mrocklincommented, Apr 16, 2016

If you want to use graphs explicitly you would need to modify your graph to point to the "data" key rather than to the object itself.

Before

dask = {
   "data": data,
   "task1": (fn, data, arg1),
   "task2": (fn, data, arg2),
   ...
   "result": ["task1", "task2", ...]
}

After

dask = {
   "data": data,
   "task1": (fn, "data", arg1),
   "task2": (fn, "data", arg2),
   ...
   "result": ["task1", "task2", ...]
}

But really I would just do the following:

[data_future] = e.scatter([data])
tasks = [e.submit(fn, data_future, arg) for arg in args]
results = e.gather(tasks)

0reactions

olly-writes-codecommented, Aug 15, 2019

@thompson42 - how did you solve this for your use case?

Top Results From Across the Web

Benchmark Environments for Multitask Learning in ... - arXiv

Abstract. As demand drives systems to generalize to var- ious domains and problems, the study of multi- task, transfer and lifelong learning ...

Environments and Baseline for Multitask Reinforcement ...

"Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning." Conference on Robot Learning. PMLR, 2020. [3] Zhang, Amy, et al. " ......

Multitask Reinforcement Learning in Nondeterministic ...

An MDP is defined as a 4-tuple < S,A,T,R > characterized as follows: S is a set of states in environment, A is...

Avoiding Catastrophe: Active Dendrites Enable Multi-Task ...

A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task...

A Survey of Multi-Task Deep Reinforcement Learning - MDPI

During multi-task learning, a set of closely related tasks will be learned ... With environments with a 3D nature, the size of the...