question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance issues with defining remote functions and actor classes from within tasks.

See original GitHub issue

Consider the following code.

import ray
ray.init(num_cpus=10)

@ray.remote
def f():
    @ray.remote
    def g():
        return 1
    return ray.get(g.remote())

ray.get([f.remote() for _ in range(10)])

If the 10 copies of f are scheduled on 10 different workers, they will all define g. Each copy of g will be pickled and exported through Redis and then imported by each worker process. So there is an N^2 effect here.

Ideally, we would deduplicate the imports. However, there doesn’t appear to be an easy way to determine if the g functions that are exported are actually the same or not. If you just look at the body of the function (e.g., with inspect.getsource), then you will think that two functions are the same if they have the same body but close over different variables in the environment. We can compare the serialized strings generated by cloudpickle, but cloudpickle is nondeterministic, so the same function pickled in different processes will often give rise to different strings. Therefore not enough deduplication will happen.

In https://github.com/ray-project/ray/pull/6175, we’re settling for not doing any deduplication but giving a warning whenever the source returned by inspect.getsource looks the same.

The longer term solution will likely be to remove the N^2 effect, e.g., perhaps by treating remote functions as objects stored in the object store (instead of Redis) or perhaps by having the workers pull the remote functions from Redis when needed (instead of pushing the remote functions proactively to the workers).

Workaround

Modify the above code to define the remote function on the driver instead. E.g.,

import ray
ray.init(num_cpus=10)

@ray.remote
def g():
    return 1

@ray.remote
def f():
    return ray.get(g.remote())

ray.get([f.remote() for _ in range(10)])

You can look at the different values of len(ray.worker.global_worker.redis_client.lrange('Exports', 0, -1)) produced by the two workloads.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
markgoodheadcommented, Jun 7, 2020

FYI I get this issue for Ray Tune’s internals which sent me here:

WARNING import_thread.py:126 – The actor ‘WrappedFunc’ has been exported 100 times. It’s possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.

I’m using remote calls within my trainable function (each Tune task has 3k sub tasks) but I’m defining it outside of the Trainable function, like in the second example) so I’m not sure why this would still apply?

0reactions
jmakovcommented, Sep 23, 2021

Got the same warning with Tune. No ray calls in my tunable function.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Anti-pattern: Redefining the same remote function or class ...
Anti-pattern: Redefining the same remote function or class harms performance#. TLDR: Avoid redefining the same remote function or class.
Read more >
Ray Tutorial | A Quest After Perspectives
In this exercise, we construct a sequence of tasks each of which depends on the previous mimicking a data parallel application.
Read more >
4. Remote Actors - Scaling Python with Ray [Book]
Once the GCS responds, the remainder of the actor creation process is asynchronous. The creating worker process queues locally a special task known...
Read more >
Programming in Ray: Tips for first-time users - RISE Lab
@ray.remote, Function or class decorator specifying that the function will be executed as a task or the class as an actor in a...
Read more >
ray_tutorial.py
The standard way to turn a Python function into a remote function is to add the ... Actor classes differ from regular Python...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found