Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[autoscaler] KeyError when starting private cluster

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Ray installed from (source or binary): pip
Ray version: 0.6.5
Python version: 3.6.7
Exact command to reproduce:

ray create-or-update cluster.yaml

Describe the problem

Source code / logs

I followed the documentation and modified example-full.yaml to fill in username, node IP addresses, and custom setup commands.

Traceback:

ray create-or-update cluster.yaml
/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py:38: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(open(config_file).read())
/tmp/env/lib/python3.6/site-packages/ray/autoscaler/node_provider.py:115: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  defaults = yaml.load(f)
2019-04-04 03:39:25,901	INFO node_provider.py:34 -- ClusterState: Loaded cluster state: {'c79.millennium.berkeley.edu': {'tags': {'ray-node-type': 'worker'}, 'state': 'terminated'}, 'c80.millennium.berkeley.edu': {'tags': {'ray-node-type': 'head', 'ray-launch-config': '6c51b8169c9469f0fa2568e5d238af2585d302a7', 'ray-node-name': 'ray-default-head'}, 'state': 'running'}}
2019-04-04 03:39:25,902	INFO node_provider.py:59 -- ClusterState: Writing cluster state: {'c79.millennium.berkeley.edu': {'tags': {'ray-node-type': 'worker'}, 'state': 'terminated'}, 'c80.millennium.berkeley.edu': {'tags': {'ray-node-type': 'head', 'ray-launch-config': '6c51b8169c9469f0fa2568e5d238af2585d302a7', 'ray-node-name': 'ray-default-head'}, 'state': 'running'}}
This will restart cluster services [y/N]: y
2019-04-04 03:39:29,888	INFO commands.py:202 -- get_or_create_head_node: Updating files on head node...
Traceback (most recent call last):
  File "/tmp/env/bin/ray", line 11, in <module>
    sys.exit(main())
  File "/tmp/env/lib/python3.6/site-packages/ray/scripts/scripts.py", line 766, in main
    return cli()
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/tmp/env/lib/python3.6/site-packages/ray/scripts/scripts.py", line 460, in create_or_update
    no_restart, restart_only, yes, cluster_name)
  File "/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 47, in create_or_update_cluster
    override_cluster_name)
  File "/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 243, in get_or_create_head_node
    initialization_commands=config["initialization_commands"],
KeyError: 'initialization_commands'

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:7 (2 by maintainers)

Top GitHub Comments

2reactions

EntropicMonkeycommented, Jul 14, 2019

55, in init TAG_RAY_NODE_TYPE] == “head” AssertionError

This particular assertion appears to be a separate problem, as I discovered the hard way. It seems to occur when the address of head_ip is also included in worker_ips. Removing the head_ip from the worker list eliminated the error for me. I also found it necessary to delete the tmp/cluster-<name>.state file from broken runs to prevent errors a few lines later when it tries to the missing head_ip to the worker_ips.

1reaction

pschafhaltercommented, Apr 8, 2019

I’m getting the same error on the latest master. This line is causing the error.

Looks like the local example is out of date.

Top Results From Across the Web

Autoscaler failing on minikube - Kubernetes - Ray

Hello, I get the below exception with autoscaler: 2021-04-22 15:06:23806 ... /ray/autoscaler/_private/autoscaler.py”, line 140, in update

Ray cluster launch with yaml aws AttributeError - Stack Overflow

I am trying to launch the simplest version of an aws docker cluster launch possible for a proof of principle.

Autoscaling clusters | Dataproc Documentation - Google Cloud

An Autoscaling Policy is a reusable configuration that describes how cluster workers using the autoscaling policy should scale. It defines scaling boundaries, ...

Cannot get a Rancher cluster setup

Hello, I'm new to Docker/Rancher/Kubernetes in general. I'm setting up a POC for an internal team and they want to try and use...

Autoscaling in Nomad Cluster – DEVOPS DONE RIGHT - Blog

Since Kubernetes has its own method of autoscaling using the metrics-server, ... But just like we discussed in our previous blog on Running...