question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[autoscaler] KeyError when starting private cluster

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Ray installed from (source or binary): pip
  • Ray version: 0.6.5
  • Python version: 3.6.7
  • Exact command to reproduce:

ray create-or-update cluster.yaml

Describe the problem

Source code / logs

I followed the documentation and modified example-full.yaml to fill in username, node IP addresses, and custom setup commands.

Traceback:

ray create-or-update cluster.yaml
/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py:38: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(open(config_file).read())
/tmp/env/lib/python3.6/site-packages/ray/autoscaler/node_provider.py:115: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  defaults = yaml.load(f)
2019-04-04 03:39:25,901	INFO node_provider.py:34 -- ClusterState: Loaded cluster state: {'c79.millennium.berkeley.edu': {'tags': {'ray-node-type': 'worker'}, 'state': 'terminated'}, 'c80.millennium.berkeley.edu': {'tags': {'ray-node-type': 'head', 'ray-launch-config': '6c51b8169c9469f0fa2568e5d238af2585d302a7', 'ray-node-name': 'ray-default-head'}, 'state': 'running'}}
2019-04-04 03:39:25,902	INFO node_provider.py:59 -- ClusterState: Writing cluster state: {'c79.millennium.berkeley.edu': {'tags': {'ray-node-type': 'worker'}, 'state': 'terminated'}, 'c80.millennium.berkeley.edu': {'tags': {'ray-node-type': 'head', 'ray-launch-config': '6c51b8169c9469f0fa2568e5d238af2585d302a7', 'ray-node-name': 'ray-default-head'}, 'state': 'running'}}
This will restart cluster services [y/N]: y
2019-04-04 03:39:29,888	INFO commands.py:202 -- get_or_create_head_node: Updating files on head node...
Traceback (most recent call last):
  File "/tmp/env/bin/ray", line 11, in <module>
    sys.exit(main())
  File "/tmp/env/lib/python3.6/site-packages/ray/scripts/scripts.py", line 766, in main
    return cli()
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/tmp/env/lib/python3.6/site-packages/ray/scripts/scripts.py", line 460, in create_or_update
    no_restart, restart_only, yes, cluster_name)
  File "/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 47, in create_or_update_cluster
    override_cluster_name)
  File "/tmp/env/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 243, in get_or_create_head_node
    initialization_commands=config["initialization_commands"],
KeyError: 'initialization_commands'

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
EntropicMonkeycommented, Jul 14, 2019

55, in init TAG_RAY_NODE_TYPE] == “head” AssertionError

This particular assertion appears to be a separate problem, as I discovered the hard way. It seems to occur when the address of head_ip is also included in worker_ips. Removing the head_ip from the worker list eliminated the error for me. I also found it necessary to delete the tmp/cluster-<name>.state file from broken runs to prevent errors a few lines later when it tries to the missing head_ip to the worker_ips.

1reaction
pschafhaltercommented, Apr 8, 2019

I’m getting the same error on the latest master. This line is causing the error.

Looks like the local example is out of date.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Autoscaler failing on minikube - Kubernetes - Ray
Hello, I get the below exception with autoscaler: 2021-04-22 15:06:23806 ... /ray/autoscaler/_private/autoscaler.py”, line 140, in update
Read more >
Ray cluster launch with yaml aws AttributeError - Stack Overflow
I am trying to launch the simplest version of an aws docker cluster launch possible for a proof of principle.
Read more >
Autoscaling clusters | Dataproc Documentation - Google Cloud
An Autoscaling Policy is a reusable configuration that describes how cluster workers using the autoscaling policy should scale. It defines scaling boundaries, ...
Read more >
Cannot get a Rancher cluster setup
Hello, I'm new to Docker/Rancher/Kubernetes in general. I'm setting up a POC for an internal team and they want to try and use...
Read more >
Autoscaling in Nomad Cluster – DEVOPS DONE RIGHT - Blog
Since Kubernetes has its own method of autoscaling using the metrics-server, ... But just like we discussed in our previous blog on Running...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found