question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] runtime_env for actors and tasks

See original GitHub issue

There’s been a lot of user questions and discussion around how to specify dependencies for a given job, actor, or task. There are a few different general use cases here:

  1. Users want to run tasks/actors that require different or conflicting Python dependencies as part of one Ray application.
  2. Users want to use Docker containers to manage dependencies (not just Python) for different tasks and actors as part of one Ray application.
  3. Users want to distribute local Python modules and files to their workers/actors for rapid iteration/development.
  4. Users want to easily install new Python packages as part of their development workflow (e.g., change library versions without restarting the cluster).

This proposal is to introduce a new runtime_env API that enables all of these use cases and can generalize to future worker environment-related demands.

The runtime_env will be a dictionary that can be passed as an option to actor/task creation:

f.options(runtime_env=env).remote()
Actor.options(runtime_env=env).remote()

This dictionary will include the following arguments:

  • container_image (str): Require a given (Docker) container image. The image must have the same version of Ray installed.
  • conda_env (str): Activates a named conda environment that the worker will run in. The environment must already exist on the node.
  • files (Path): Project files and local modules to unpack in the working directory of the task/actor.
  • (possible future extension) python_requirements (Union[File, List[str]]): List of Python requirements or a requirements.txt file to use to dynamically create a new conda environment.

These options should cover the all known dependency management use cases listed above.

Misc semantics:

  • Any downstream tasks/actors will by default inherit the runtime_env of their parent.
  • The runtime_env needs to be able to be specified on an individual actor and task basis, but for convenience it should also be able to be set in the JobConfig as a default for all tasks/actors spawned by the driver.

At this point, this RFC is primarily about the use cases and interface, not the implementation of each runtime_env option. Please comment if you believe there is a use case not covered, the UX could be improved, etc.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:25
  • Comments:19 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
edoakescommented, Feb 10, 2021

@valiantljk good points. We’ve discussed more about the implementation about the container_image env as that’s actually what prompted this discussion initially. Here’s a link to a document that goes more in-depth on it: https://docs.google.com/document/d/1MbsjSye2KgYuLPPUWziU8iPTBCC_0h9EOrUIX4LFC2M/edit#

The TL;DR is that for now this will be tightly integrated with the autoscaler. The autoscaler will add new node types to satisfy scheduling constraints specified by the container_image requirement. In the future we may be able to support per-worker containers, which would allow us to remove this tight coupling of the container image and the node.

1reaction
nostalgicimpcommented, May 24, 2021

Is this RFC for runtime_env is still ongoing under development? I saw the docs already show its usage(https://docs.ray.io/en/master/advanced.html#conda-environments-for-tasks-and-actors)(https://docs.ray.io/en/master/package-ref.html#ray-remote), but my installed ray2.0.0.dev0 version still couldn’t recognize this newly added runtime_env and override_environment_variables.

My ray.get(read_from_hdfs.options(runtime_env={“SEC_TOKEN_STRING”:token}).remote()) failed with following error. runtime_env is not within the listed options.

AssertionError: The @ray.remote decorator must be applied either with no arguments and no parentheses, for example ‘@ray.remote’, or it must be applied using some of the arguments ‘num_returns’, ‘num_cpus’, ‘num_gpus’, ‘memory’, ‘object_store_memory’, ‘resources’, ‘max_calls’, or ‘max_restarts’, like ‘@ray.remote(num_returns=2, resources={“CustomResource”: 1})’.

Read more comments on GitHub >

github_iconTop Results From Across the Web

JCo client support for asynchronous RFC calls - SAP Community
I would like to use asynchronous calls, e.g., by bfRFC. From the JCo JavaDoc, I would say that the way to go is...
Read more >
An Automatic Certificate Management Environment (ACME ...
An Automatic Certificate Management Environment (ACME) Profile for Generating Delegated Certificates. Abstract. This document defines a profile ...
Read more >
Environment Dependencies — Ray 2.2.0
A runtime environment describes the dependencies your Ray application needs to run, including files, packages, environment variables, and more. It is installed ...
Read more >
RFC Programming in ABAP - consolut
Transactional RFCs use the suffix IN BACKGROUND TASK. As with synchronous calls, the DESTINATION parameter defines a program context in the.
Read more >
RFC: Java Process & Task Runtime APIs
Process Runtimes are created, tested, maintained and evolved with these restrictions in mind. Process Runtimes have a lifecycle, they can be ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found