Import not working in a cluster
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- Ray installed from (source or binary): pip
- Ray version: 0.5.3
- Python version: 3.5.2
- Exact command to reproduce: python3 hello.py
Describe the problem
I have 2 AWS EC2 instances running with one as head and one as worker. Set up was done using the instructions for Manual cluster setup. I have two python files. One of them imports a function from the other and calls that function inside a remote
function. Running it gives Exception: This function was not imported properly.
. The set up runs fine if the second file is not imported or if there is only the head node in the cluster.
Source code / logs
Two files:
# hello.py
import ray
import sys
ray.init(redis_address="172.31.2.108:6379")
import time
from testimport import sleep
@ray.remote
def f():
time.sleep(0.01)
sleep(0.01)
return "python version: %s, ip: %s" % (sys.version_info, ray.services.get_node_ip_address())
# Get a list of the IP addresses of the nodes that have joined the cluster.
print(set(ray.get([f.remote() for _ in range(100)])))
# testimport.py
import time
def sleep(n):
time.sleep(n)
Command run:
python3 hello.py
Output:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.5/site-packages/ray/import_thread.py", line 131, in fetch_and_register_remote_function
function = pickle.loads(serialized_function)
AttributeError: Can't get attribute 'sleep' on <module 'testimport' from '/home/ubuntu/workspace/raytest/testimport.py'>
Remote function __main__.f failed with:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.5/site-packages/ray/import_thread.py", line 123, in f
raise Exception("This function was not imported properly.")
Exception: This function was not imported properly.
Suppressing duplicate error message.
Suppressing duplicate error message.
Suppressing duplicate error message.
.
.
.
Suppressing duplicate error message.
Suppressing duplicate error message.
Suppressing duplicate error message.
Traceback (most recent call last):
File "hello.py", line 16, in <module>
print(set(ray.get([f.remote() for _ in range(100)])))
File "/home/ubuntu/.local/lib/python3.5/site-packages/ray/worker.py", line 2514, in get
raise RayGetError(object_ids[i], value)
ray.worker.RayGetError: Could not get objectid ObjectID(46fe73ad1bbfdc7c6293a0d80e570329a911ecf9). It was created by remote function __main__.f which failed with:
Remote function __main__.f failed with:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.5/site-packages/ray/import_thread.py", line 123, in f
raise Exception("This function was not imported properly.")
Exception: This function was not imported properly.
Commenting out sleep(0.01)
from hello.py
gives:
{"python version: sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0), ip: 172.31.2.108", "python version: sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0), ip: 172.31.0.97"}
So the set up is working for a single file with no imports.
Removing the worker node from the cluster gives:
{"python version: sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0), ip: 172.31.2.108"}
So the import works on a single node cluster.
Seems like the import of the second file does not work when running on a cluster.
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (3 by maintainers)
Top GitHub Comments
The new Runtime Environments feature, which didn’t exist at the time of this post, should help with this issue: https://docs.ray.io/en/latest/handling-dependencies.html#runtime-environments. (See the
working_dir
andpy_modules
entries.)also this patch to cloudpickle might help: https://github.com/cloudpipe/cloudpickle/pull/391