Ray Cluster ModuleNotFoundError
See original GitHub issueSystem information
- OS Platform and Distribution: Ubuntu 16.04.2 LTS
- Ray installed from (source or binary): Binary
- Ray version: 0.7.2
- Python version: 3.6.8
I am trying to build a manual cluster of the machines with IP Addresses. However, When I tried to run the PPO algorithm on the cluster I got an error message from one of the workers complaining about ModuleNotFoundError: No module named “v2i”. Here the main module is my custom gym environment. It looks like ray could not able to sync the files between different nodes. Here is the complete traceback. wsl is my worker hostname.
Traceback (most recent call last):
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
result = ray.get(trial_future[0])
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/worker.py", line 2195, in get
raise value
ray.exceptions.RayTaskError: [36mray_PPO:train()[39m (pid=30729, host=rlmac)
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 364, in train
raise e
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 353, in train
result = Trainable.train(self)
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/tune/trainable.py", line 150, in train
result = self._train()
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 126, in _train
fetches = self.optimizer.step()
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 130, in step
self.num_envs_per_worker, self.train_batch_size)
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/optimizers/rollout.py", line 29, in collect_samples
next_sample = ray_get_and_free(fut_sample)
File "/home/mayank/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/rllib/utils/memory.py", line 33, in ray_get_and_free
result = ray.get(object_ids)
ray.exceptions.RayTaskError: [36mray_RolloutWorker:sample()[39m (pid=11974, host=wsl)
File "pyarrow/serialization.pxi", line 461, in pyarrow.lib.deserialize
File "pyarrow/serialization.pxi", line 424, in pyarrow.lib.deserialize_from
File "pyarrow/serialization.pxi", line 275, in pyarrow.lib.SerializedPyObject.deserialize
File "pyarrow/serialization.pxi", line 174, in pyarrow.lib.SerializationContext._deserialize_callback
File "/media/win/MayankPal/miniconda3/envs/v2i/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 965, in subimport
__import__(name)
ModuleNotFoundError: No module named 'v2i'
Describe the problem
Source code / logs
- First start the ray head
ray start --head --redis-port=6666 --num-cpus=22 --num-gpus=1
- Start ray on worker machine with above redis address
ray start --redis-address=xxx.xxx.xxx.xxx:6666
- Start PPO training
python train.py
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
ModuleNotFoundError from the cluster - Ray Core
Hi, I'm new to Ray and trying to parallelize my calc by a cluster, but I encountered 'ModuleNotFoundError' from some of my remote...
Read more >Ray cluster ModuleNotFoundError - Stack Overflow
I am new to Ray and wanted to set up a cluster with some dependencies. I first set up a cluster with some...
Read more >ray - PyPI
Ray is a unified way to scale Python and AI applications from a laptop to a cluster. With Ray, you can seamlessly scale...
Read more >No module named 'ray' - Google Groups
I'm trying to run analytics zoo latest version on yarn cluster mode. Created and zipped virtual environment with required dependencies and provided required ......
Read more >ModuleNotFoundError: No module named 'ray' - RoseIndia.Net
Hi, My Python program is throwing following error: ModuleNotFoundError: No module named 'ray' How to remove the ModuleNotFound.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Just in case anyone face the same issue. Let me elaborate how to do the above suggestion concretely. (I am very new to raytune, so please take it with a grain of salt).
OS Platform and Distribution: Ubuntu 20.04.3 LTS Ray installed from (source or binary): Binary (pip install) Ray version: 1.10.0 Python version: 3.8.10 Usage: raytune for hyperparameter tuning.
Normally, when you want to use a custom python module, you will have to use the following line in your code.
However, this seems to effect only in the main program that you used to call
tune.run
, but will not carried along to the workers as mention above. As a result, the code insidetune_method
will not be able to use the module added throughsys.path.append
. To solve this, we need to find a way to set the PYTHONPATH in the workers, one way to do it is to set the PYTHONPATH variable before calling tune.run as the following (I am guessing that ray.init is probably called in side this tune.run method) :Alternatively, I believed that you can do it like this also, but it is not suit my use-case, as I am running everything in one machine.
Note: The solution here can also be the answer for the following issue (https://github.com/ray-project/ray/issues/10067).
Does it work if you set the PYTHONPATH in
os.environ
before callingray.init
(assuming you’re on a single machine)?