[RFC] runtime_env for actors and tasks
See original GitHub issueThere’s been a lot of user questions and discussion around how to specify dependencies for a given job, actor, or task. There are a few different general use cases here:
- Users want to run tasks/actors that require different or conflicting Python dependencies as part of one Ray application.
- Users want to use Docker containers to manage dependencies (not just Python) for different tasks and actors as part of one Ray application.
- Users want to distribute local Python modules and files to their workers/actors for rapid iteration/development.
- Users want to easily install new Python packages as part of their development workflow (e.g., change library versions without restarting the cluster).
This proposal is to introduce a new runtime_env
API that enables all of these use cases and can generalize to future worker environment-related demands.
The runtime_env
will be a dictionary that can be passed as an option to actor/task creation:
f.options(runtime_env=env).remote()
Actor.options(runtime_env=env).remote()
This dictionary will include the following arguments:
container_image (str)
: Require a given (Docker) container image. The image must have the same version of Ray installed.conda_env (str)
: Activates a named conda environment that the worker will run in. The environment must already exist on the node.files (Path)
: Project files and local modules to unpack in the working directory of the task/actor.- (possible future extension)
python_requirements (Union[File, List[str]])
: List of Python requirements or a requirements.txt file to use to dynamically create a new conda environment.
These options should cover the all known dependency management use cases listed above.
Misc semantics:
- Any downstream tasks/actors will by default inherit the
runtime_env
of their parent. - The
runtime_env
needs to be able to be specified on an individual actor and task basis, but for convenience it should also be able to be set in theJobConfig
as a default for all tasks/actors spawned by the driver.
At this point, this RFC is primarily about the use cases and interface, not the implementation of each runtime_env
option. Please comment if you believe there is a use case not covered, the UX could be improved, etc.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:25
- Comments:19 (12 by maintainers)
@valiantljk good points. We’ve discussed more about the implementation about the container_image env as that’s actually what prompted this discussion initially. Here’s a link to a document that goes more in-depth on it: https://docs.google.com/document/d/1MbsjSye2KgYuLPPUWziU8iPTBCC_0h9EOrUIX4LFC2M/edit#
The TL;DR is that for now this will be tightly integrated with the autoscaler. The autoscaler will add new node types to satisfy scheduling constraints specified by the container_image requirement. In the future we may be able to support per-worker containers, which would allow us to remove this tight coupling of the container image and the node.
Is this RFC for runtime_env is still ongoing under development? I saw the docs already show its usage(https://docs.ray.io/en/master/advanced.html#conda-environments-for-tasks-and-actors)(https://docs.ray.io/en/master/package-ref.html#ray-remote), but my installed ray2.0.0.dev0 version still couldn’t recognize this newly added runtime_env and override_environment_variables.
My ray.get(read_from_hdfs.options(runtime_env={“SEC_TOKEN_STRING”:token}).remote()) failed with following error. runtime_env is not within the listed options.
AssertionError: The @ray.remote decorator must be applied either with no arguments and no parentheses, for example ‘@ray.remote’, or it must be applied using some of the arguments ‘num_returns’, ‘num_cpus’, ‘num_gpus’, ‘memory’, ‘object_store_memory’, ‘resources’, ‘max_calls’, or ‘max_restarts’, like ‘@ray.remote(num_returns=2, resources={“CustomResource”: 1})’.