question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ray Cluster ModuleNotFoundError

See original GitHub issue

System information

  • OS Platform and Distribution: Ubuntu 16.04.2 LTS
  • Ray installed from (source or binary): Binary
  • Ray version: 0.7.2
  • Python version: 3.6.8

I am trying to build a manual cluster of the machines with IP Addresses. However, When I tried to run the PPO algorithm on the cluster I got an error message from one of the workers complaining about ModuleNotFoundError: No module named “v2i”. Here the main module is my custom gym environment. It looks like ray could not able to sync the files between different nodes. Here is the complete traceback. wsl is my worker hostname.


Traceback (most recent call last):
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/worker.py", line 2195, in get
    raise value
ray.exceptions.RayTaskError: ray_PPO:train() (pid=30729, host=rlmac)
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 364, in train
    raise e
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 353, in train
    result = Trainable.train(self)
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/tune/trainable.py", line 150, in train
    result = self._train()
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 126, in _train
    fetches = self.optimizer.step()
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 130, in step
    self.num_envs_per_worker, self.train_batch_size)
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/optimizers/rollout.py", line 29, in collect_samples
    next_sample = ray_get_and_free(fut_sample)
  File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/utils/memory.py", line 33, in ray_get_and_free
    result = ray.get(object_ids)
ray.exceptions.RayTaskError: ray_RolloutWorker:sample() (pid=11974, host=wsl)
  File "pyarrow/serialization.pxi", line 461, in pyarrow.lib.deserialize
  File "pyarrow/serialization.pxi", line 424, in pyarrow.lib.deserialize_from
  File "pyarrow/serialization.pxi", line 275, in pyarrow.lib.SerializedPyObject.deserialize
  File "pyarrow/serialization.pxi", line 174, in pyarrow.lib.SerializationContext._deserialize_callback
  File "/media/win/MayankPal/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 965, in subimport
    __import__(name)
ModuleNotFoundError: No module named 'v2i'

Describe the problem

Source code / logs

  • First start the ray head ray start --head --redis-port=6666 --num-cpus=22 --num-gpus=1
  • Start ray on worker machine with above redis address ray start --redis-address=xxx.xxx.xxx.xxx:6666
  • Start PPO training python train.py

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

7reactions
pvmilkcommented, Feb 17, 2022

Just in case anyone face the same issue. Let me elaborate how to do the above suggestion concretely. (I am very new to raytune, so please take it with a grain of salt).

OS Platform and Distribution: Ubuntu 20.04.3 LTS Ray installed from (source or binary): Binary (pip install) Ray version: 1.10.0 Python version: 3.8.10 Usage: raytune for hyperparameter tuning.

Normally, when you want to use a custom python module, you will have to use the following line in your code.

    sys.path.append(module_path)

However, this seems to effect only in the main program that you used to call tune.run, but will not carried along to the workers as mention above. As a result, the code inside tune_method will not be able to use the module added through sys.path.append. To solve this, we need to find a way to set the PYTHONPATH in the workers, one way to do it is to set the PYTHONPATH variable before calling tune.run as the following (I am guessing that ray.init is probably called in side this tune.run method) :

    os.environ['PYTHONPATH'] = module_path
    tune.run(tuning_method, ...)

Alternatively, I believed that you can do it like this also, but it is not suit my use-case, as I am running everything in one machine.

Note: The solution here can also be the answer for the following issue (https://github.com/ray-project/ray/issues/10067).

1reaction
richardliawcommented, Jan 21, 2020

Does it work if you set the PYTHONPATH in os.environ before calling ray.init (assuming you’re on a single machine)?

Read more comments on GitHub >

github_iconTop Results From Across the Web

ModuleNotFoundError from the cluster - Ray Core
Hi, I'm new to Ray and trying to parallelize my calc by a cluster, but I encountered 'ModuleNotFoundError' from some of my remote...
Read more >
Ray cluster ModuleNotFoundError - Stack Overflow
I am new to Ray and wanted to set up a cluster with some dependencies. I first set up a cluster with some...
Read more >
ray - PyPI
Ray is a unified way to scale Python and AI applications from a laptop to a cluster. With Ray, you can seamlessly scale...
Read more >
No module named 'ray' - Google Groups
I'm trying to run analytics zoo latest version on yarn cluster mode. Created and zipped virtual environment with required dependencies and provided required ......
Read more >
ModuleNotFoundError: No module named 'ray' - RoseIndia.Net
Hi, My Python program is throwing following error: ModuleNotFoundError: No module named 'ray' How to remove the ModuleNotFound.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found