question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Import not working in a cluster

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • Ray installed from (source or binary): pip
  • Ray version: 0.5.3
  • Python version: 3.5.2
  • Exact command to reproduce: python3 hello.py

Describe the problem

I have 2 AWS EC2 instances running with one as head and one as worker. Set up was done using the instructions for Manual cluster setup. I have two python files. One of them imports a function from the other and calls that function inside a remote function. Running it gives Exception: This function was not imported properly.. The set up runs fine if the second file is not imported or if there is only the head node in the cluster.

Source code / logs

Two files:

# hello.py

import ray
import sys

ray.init(redis_address="172.31.2.108:6379")

import time
from testimport import sleep

@ray.remote
def f():
    time.sleep(0.01)
    sleep(0.01)
    return "python version: %s, ip: %s" % (sys.version_info, ray.services.get_node_ip_address())

# Get a list of the IP addresses of the nodes that have joined the cluster.
print(set(ray.get([f.remote() for _ in range(100)])))
# testimport.py
import time

def sleep(n):
    time.sleep(n)

Command run: python3 hello.py

Output:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.5/site-packages/ray/import_thread.py", line 131, in fetch_and_register_remote_function
    function = pickle.loads(serialized_function)
AttributeError: Can't get attribute 'sleep' on <module 'testimport' from '/home/ubuntu/workspace/raytest/testimport.py'>

Remote function __main__.f failed with:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.5/site-packages/ray/import_thread.py", line 123, in f
    raise Exception("This function was not imported properly.")
Exception: This function was not imported properly.

Suppressing duplicate error message.
Suppressing duplicate error message.
Suppressing duplicate error message.
.
.
.
Suppressing duplicate error message.
Suppressing duplicate error message.
Suppressing duplicate error message.
Traceback (most recent call last):
  File "hello.py", line 16, in <module>
    print(set(ray.get([f.remote() for _ in range(100)])))
  File "/home/ubuntu/.local/lib/python3.5/site-packages/ray/worker.py", line 2514, in get
    raise RayGetError(object_ids[i], value)
ray.worker.RayGetError: Could not get objectid ObjectID(46fe73ad1bbfdc7c6293a0d80e570329a911ecf9). It was created by remote function __main__.f which failed with:

Remote function __main__.f failed with:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.5/site-packages/ray/import_thread.py", line 123, in f
    raise Exception("This function was not imported properly.")
Exception: This function was not imported properly.

Commenting out sleep(0.01) from hello.py gives: {"python version: sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0), ip: 172.31.2.108", "python version: sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0), ip: 172.31.0.97"} So the set up is working for a single file with no imports.

Removing the worker node from the cluster gives: {"python version: sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0), ip: 172.31.2.108"} So the import works on a single node cluster.

Seems like the import of the second file does not work when running on a cluster.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:14 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
architkulkarnicommented, Jan 15, 2022

The new Runtime Environments feature, which didn’t exist at the time of this post, should help with this issue: https://docs.ray.io/en/latest/handling-dependencies.html#runtime-environments. (See the working_dir and py_modules entries.)

0reactions
jseppanencommented, Mar 25, 2021

also this patch to cloudpickle might help: https://github.com/cloudpipe/cloudpickle/pull/391

Read more comments on GitHub >

github_iconTop Results From Across the Web

Imported clusters | Rancher Manager
The commands/steps listed on this page can be used to check clusters that you are importing or that are imported in Rancher.
Read more >
Node js clusters can't be imported in typescript - Stack Overflow
I have got the same issue today, and this workaround worked for me: import * as _cluster from 'cluster'; const cluster = _cluster...
Read more >
OA42695: PROBLEMS WITH IMPORT/EXPORT OF CLUSTERS.
PROBLEM DESCRIPTION: Alternative approach to very large scale deletes of configuration database while retaining prototype definitions. Export of CSG to be ...
Read more >
Error during Cluster Import - SAP Community
I wrote a little program which reads contents from Backup Cluster ZPCL2, using the IMPORT statement. At import time, it abends with ...
Read more >
Chapter 1. Troubleshooting Red Hat Advanced Cluster ...
Identifying the problem: Cluster with pending import status. Run the following command on the managed cluster to view the Kubernetes pod names that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found