EOFError: Ran out of input on Kubernetes Cluster
See original GitHub issueWhat is the problem?
I deployed a Kubernetes setup with Ray through the documentation at https://docs.ray.io/en/master/cluster/kubernetes.html#interacting-with-a-ray-cluster when I then submit a job through ray submit my-cluster.yaml myscript.py
it returns EOFError: Ran out of input
- Ray Version: Latest as defined in nightly builds at https://hub.docker.com/r/rayproject/ray
Stacktrace
2021-03-13 13:06:46,093 INFO command_runner.py:171 -- NodeUpdater: example-cluster-ray-head-mtw85: Running kubectl -n ray exec -it example-cluster-ray-head-mtw85 -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python ~/cartpole2.py)'
Traceback (most recent call last):
File "/home/ray/cartpole2.py", line 20, in <module>
agent = ppo.PPOTrainer(config, env=SELECT_ENV)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 121, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 513, in __init__
super().__init__(config, logger_creator)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/trainable.py", line 98, in __init__
self.setup(copy.deepcopy(self.config))
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 607, in setup
self.env_creator = _global_registry.get(ENV_CREATOR, env)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/registry.py", line 140, in get
return pickle.loads(value)
EOFError: Ran out of input
command terminated with exit code 1
Reproduction (REQUIRED)
- Setup a Kubernetes cluster as documented in https://docs.ray.io/en/master/cluster/kubernetes.html#k8s-cluster-launcher
- Run the file below by saving it and executing it with
ray submit <yaml-step-1> <saved-file.py>
import ray
import ray.rllib.agents.ppo as ppo
import os
import shutil
ray.util.connect("127.0.0.1:10001")
CHECKPOINT_ROOT = "tmp/ppo/cart"
shutil.rmtree(CHECKPOINT_ROOT, ignore_errors=True, onerror=None)
ray_results = os.getenv("HOME") + "/ray_results/"
shutil.rmtree(ray_results, ignore_errors=True, onerror=None)
SELECT_ENV = "CartPole-v0"
config = ppo.DEFAULT_CONFIG.copy()
config["log_level"] = "WARN"
agent = ppo.PPOTrainer(config, env=SELECT_ENV)
N_ITER = 40
s = "{:3d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:6.2f} saved {}"
for n in range(N_ITER):
result = agent.train()
file_name = agent.save(CHECKPOINT_ROOT)
print(s.format(
n + 1,
result["episode_reward_min"],
result["episode_reward_mean"],
result["episode_reward_max"],
result["episode_len_mean"],
file_name
))
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (11 by maintainers)
Top Results From Across the Web
Why do I get "Pickle - EOFError: Ran out of input" reading an ...
It is very likely that the pickled file is empty. It is surprisingly easy to overwrite a pickle file if you're copying and...
Read more >Celery throws an error with django: EOFError: Ran out of input
I'm having an issue when using celery with django. When I run celery, I get this error: Unrecoverable error: PicklingError("Can't pickle.
Read more >typeerror cannot pickle '_thread.lock' object multiprocessing - You ...
From what I can see, the Pickle module is causing the issue. it must be ... TypeError: cannot pickle '_thread.lock' object , EOFError:...
Read more >Troubleshoot Dataflow errors - Google Cloud
These errors typically occur when some of your running Dataflow jobs use the same temp_location to stage temporary job files created when the...
Read more >Troubleshooting kubeadm | Kubernetes
As with any program, you might run into an error installing or running kubeadm ... From a working control plane node in the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@DmitriGekhtman can you please follow up on this when you are back in office?
@richardliaw / @sven1977 can you please answers Xavier’s question?