Performance issues with defining remote functions and actor classes from within tasks.
See original GitHub issueConsider the following code.
import ray
ray.init(num_cpus=10)
@ray.remote
def f():
@ray.remote
def g():
return 1
return ray.get(g.remote())
ray.get([f.remote() for _ in range(10)])
If the 10 copies of f
are scheduled on 10 different workers, they will all define g
. Each copy of g
will be pickled and exported through Redis and then imported by each worker process. So there is an N^2 effect here.
Ideally, we would deduplicate the imports. However, there doesn’t appear to be an easy way to determine if the g
functions that are exported are actually the same or not. If you just look at the body of the function (e.g., with inspect.getsource
), then you will think that two functions are the same if they have the same body but close over different variables in the environment. We can compare the serialized strings generated by cloudpickle, but cloudpickle is nondeterministic, so the same function pickled in different processes will often give rise to different strings. Therefore not enough deduplication will happen.
In https://github.com/ray-project/ray/pull/6175, we’re settling for not doing any deduplication but giving a warning whenever the source returned by inspect.getsource
looks the same.
The longer term solution will likely be to remove the N^2 effect, e.g., perhaps by treating remote functions as objects stored in the object store (instead of Redis) or perhaps by having the workers pull the remote functions from Redis when needed (instead of pushing the remote functions proactively to the workers).
Workaround
Modify the above code to define the remote function on the driver instead. E.g.,
import ray
ray.init(num_cpus=10)
@ray.remote
def g():
return 1
@ray.remote
def f():
return ray.get(g.remote())
ray.get([f.remote() for _ in range(10)])
You can look at the different values of len(ray.worker.global_worker.redis_client.lrange('Exports', 0, -1))
produced by the two workloads.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
FYI I get this issue for Ray Tune’s internals which sent me here:
WARNING import_thread.py:126 – The actor ‘WrappedFunc’ has been exported 100 times. It’s possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
I’m using remote calls within my trainable function (each Tune task has 3k sub tasks) but I’m defining it outside of the Trainable function, like in the second example) so I’m not sure why this would still apply?
Got the same warning with Tune. No ray calls in my tunable function.