Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EOFError: Ran out of input on Kubernetes Cluster

See original GitHub issue

What is the problem?

I deployed a Kubernetes setup with Ray through the documentation at https://docs.ray.io/en/master/cluster/kubernetes.html#interacting-with-a-ray-cluster when I then submit a job through ray submit my-cluster.yaml myscript.py it returns EOFError: Ran out of input

Ray Version: Latest as defined in nightly builds at https://hub.docker.com/r/rayproject/ray

Stacktrace

2021-03-13 13:06:46,093 INFO command_runner.py:171 -- NodeUpdater: example-cluster-ray-head-mtw85: Running kubectl -n ray exec -it example-cluster-ray-head-mtw85 -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python ~/cartpole2.py)'
Traceback (most recent call last):
  File "/home/ray/cartpole2.py", line 20, in <module>
    agent = ppo.PPOTrainer(config, env=SELECT_ENV)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 121, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 513, in __init__
    super().__init__(config, logger_creator)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/trainable.py", line 98, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 607, in setup
    self.env_creator = _global_registry.get(ENV_CREATOR, env)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/tune/registry.py", line 140, in get
    return pickle.loads(value)
EOFError: Ran out of input
command terminated with exit code 1

Reproduction (REQUIRED)

Setup a Kubernetes cluster as documented in https://docs.ray.io/en/master/cluster/kubernetes.html#k8s-cluster-launcher
Run the file below by saving it and executing it with ray submit <yaml-step-1> <saved-file.py>

import ray
import ray.rllib.agents.ppo as ppo
import os
import shutil

ray.util.connect("127.0.0.1:10001")

CHECKPOINT_ROOT = "tmp/ppo/cart"
shutil.rmtree(CHECKPOINT_ROOT, ignore_errors=True, onerror=None)

ray_results = os.getenv("HOME") + "/ray_results/"
shutil.rmtree(ray_results, ignore_errors=True, onerror=None)

SELECT_ENV = "CartPole-v0"

config = ppo.DEFAULT_CONFIG.copy()
config["log_level"] = "WARN"

agent = ppo.PPOTrainer(config, env=SELECT_ENV)

N_ITER = 40
s = "{:3d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:6.2f} saved {}"

for n in range(N_ITER):
  result = agent.train()
  file_name = agent.save(CHECKPOINT_ROOT)

  print(s.format(
    n + 1,
    result["episode_reward_min"],
    result["episode_reward_mean"],
    result["episode_reward_max"],
    result["episode_len_mean"],
    file_name
   ))