Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

No way to configure own VPC or subnet IDs

See original GitHub issue

Hi. In the configuration file, I try to add custom subnets. However, they do not register. It ends up using completely different subnets.

# ~/Desktop/ray_cluster.yaml
cluster_name: default

min_workers: 0
max_workers: 2

docker:
    image: "rayproject/ray:latest-gpu"
    container_name: "ray_container"
    pull_before_run: True
    run_options: []

provider:
    type: aws
    region: eu-west-1
    availability_zone: eu-west-1a,eu-west-1b
    cache_stopped_nodes: False # If not present, the default is True.

auth:
    ssh_user: ubuntu

head_node:
    # VpcId:
    SubnetId: subnet-***********qwe
    # CidrBlock: 10.19.0.0/16
    InstanceType: m5a.large
    ImageId: ami-006ff58f5247c50eb # Deep Learning AMI (Ubuntu) Version 30

    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
              VolumeSize: 100


worker_nodes:
    SubnetId: subnet-***********qwe
    InstanceType: m5a.large
    ImageId: ami-006ff58f5247c50eb # Deep Learning AMI (Ubuntu) Version 30

ray up -y ~/Desktop/ray_cluster.yaml

Then it starts the cluster, using completely different subnets!

Cluster: default

Checking AWS environment settings
AWS config
  IAM Profile: ray-autoscaler-v1 [default]
  EC2 Key pair (head & workers): ray-autoscaler_eu-west-1 [default]
  VPC Subnets (head & workers): subnet-******xyz, subnet-*******abc, subnet-*******ghj, subnet-*********jkl [default]
  EC2 Security groups (head & workers): sg-08311cf18ac8ee7fe [default]
  EC2 AMI (head & workers): ami-006ff58f5247c50eb

No head node found. Launching a new cluster. Confirm [y/N]: y [automatic, due to --yes]

Acquiring an up-to-date head node
  Launched 1 nodes [subnet_id=subnet-03522bc8f27f03e72]
    Launched instance i-05b500ebd17f1bfe9 [state=pending, info=pending]
  Launched a new head node
  Fetching the new head node

<1/1> Setting up head node
  Prepared bootstrap config
  New status: waiting-for-ssh
  [1/6] Waiting for SSH to become available
    Running `uptime` as a test.
    Waiting for IP
      Not yet available, retrying in 10 seconds

Issue Analytics

State:
Created 3 years ago
Comments:9 (4 by maintainers)

Top GitHub Comments

1reaction

richardliawcommented, Dec 3, 2020

Nice - can you try giving SubnetIds a list instead of a string?

SubnetIds: [subnet-…]

On Thu, Dec 3, 2020 at 8:34 AM Laksh1997 notifications@github.com wrote:

An unique identifier for the head node and workers of this cluster.

cluster_name: default

The minimum number of workers nodes to launch in addition to the head

node. This number should be >= 0.

min_workers: 0

The maximum number of workers nodes to launch in addition to the head

node. This takes precedence over min_workers.

max_workers: 10000

upscaling_speed: 0.5

This executes all commands on all nodes in the docker container,

and opens all the necessary ports to support the Ray cluster.

Empty string means disabled.

docker: image: “rayproject/ray:latest-gpu” # You can change this to latest-cpu if you don’t need GPU support and want a faster startup container_name: “ray_container” # If true, pulls latest version of image. Otherwise, docker run will only pull the image # if no cached version is present. pull_before_run: True run_options: [] # Extra options to pass into “docker run”
# Example of running a GPU head with CPU workers
# head_image: "rayproject/ray:latest-gpu"
# Allow Ray to automatically detect GPUs

# worker_image: "rayproject/ray:latest-cpu"
# worker_run_options: []
If a node is idle for this many minutes, it will be removed.

idle_timeout_minutes: 5

Cloud-provider specific configuration.

provider: type: aws region: eu-west-1 # Availability zone(s), comma-separated, that nodes may be launched in. # Nodes are currently spread between zones by a round-robin approach, # however this implementation detail should not be relied upon. availability_zone: eu-west-1a,eu-west-1b # Whether to allow node reuse. If set to False, nodes will be terminated # instead of stopped. cache_stopped_nodes: False # If not present, the default is True.

How Ray will authenticate with newly launched nodes.

auth: ssh_user: ubuntu

By default Ray creates a new private keypair, but you can also use your own.

If you do so, make sure to also set “KeyName” in the head and worker node

configurations below.

ssh_private_key: /path/to/your/key.pem

Provider-specific config for the head node, e.g. instance type. By default

Ray will auto-configure unspecified fields such as SubnetId and KeyName.

For more documentation on available fields, see:

http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances

head_node: # VpcId: # asdfasdf: hello SubnetIds: subnet-********** # CidrBlock: 10.19.0.0/16 InstanceType: m5a.large ImageId: ami-006ff58f5247c50eb # Deep Learning AMI (Ubuntu) Version 30
# You can provision additional disk space with a conf as follows
BlockDeviceMappings:
    - DeviceName: /dev/sda1
      Ebs:
          VolumeSize: 100

# Additional options in the boto docs.
Provider-specific config for worker nodes, e.g. instance type. By default

Ray will auto-configure unspecified fields such as SubnetId and KeyName.

For more documentation on available fields, see:

http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances

worker_nodes: SubnetIds: subnet-********** InstanceType: m5a.large ImageId: ami-006ff58f5247c50eb # Deep Learning AMI (Ubuntu) Version 30
# Run workers on spot by default. Comment this out to use on-demand.
InstanceMarketOptions:
    MarketType: spot
    # Additional options can be found in the boto docs, e.g.
    #   SpotOptions:
    #       MaxPrice: MAX_HOURLY_PRICE

# Additional options in the boto docs.
Files or directories to copy to the head and worker nodes. The format is a

dictionary from REMOTE_PATH: LOCAL_PATH, e.g.

file_mounts: {

“/path1/on/remote/machine”: “/path1/on/local/machine”,

“/path2/on/remote/machine”: “/path2/on/local/machine”,

}

Files or directories to copy from the head node to the worker nodes. The format is a

list of paths. The same path on the head node will be copied to the worker node.

This behavior is a subset of the file_mounts behavior. In the vast majority of cases

you should just use file_mounts. Only use this if you know what you’re doing!

cluster_synced_files: []

Whether changes to directories in file_mounts or cluster_synced_files in the head node

should sync to the worker node continuously

file_mounts_sync_continuously: False

List of commands that will be run before setup_commands. If docker is

enabled, these commands will run outside the container and before docker

is setup.

initialization_commands: []

List of shell commands to run to set up nodes.

setup_commands: - apt-get update -y - apt-get install -y libcairo2-dev # - rm -f /tmp/sshkey # - rm -f /tmp/sshkey.pub # - ssh-keygen -b 2048 -t rsa -f /tmp/sshkey -q -N “” # - eval ssh-agent # - pip install jupyterlab # - ssh-add /tmp/sshkey # - ssh -T git@bitbucket.org - pip install torch jupyterlab awscli scikit-learn s3fs xgboost - pip uninstall -y openeye-toolkits
# TQDM
# - conda install -y -c conda-forge ipywidgets nodejs
# - jupyter labextension install @jupyter-widgets/jupyterlab-manager
- pip uninstall -y ipython prompt_toolkit dataclasses
- pip install -U ipython prompt_toolkit

# Install openeye
- aws s3 cp s3://exs-dev-acc-provisioning/oe_license.txt /home/oe_license.txt
- pip install --force-reinstall --no-cache-dir -U --extra-index-url https://pypi.anaconda.org/OpenEye/simple openeye-toolkits


# Note: if you're developing Ray, you probably want to create a Docker image that
# has your Ray repo pre-cloned. Then, you can replace the pip installs
# below with a git checkout <your_sha> (and possibly a recompile).
# Uncomment the following line if you want to run the nightly version of ray (as opposed to the latest)
# - pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-1.1.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
Custom commands that will be run on the head node after common setup.

head_setup_commands: []

Custom commands that will be run on worker nodes after common setup.

worker_setup_commands: []

Command to start ray on the head node. You don’t need to change this.

head_start_ray_commands: - ray stop - ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml

Command to start ray on worker nodes. You don’t need to change this.

worker_start_ray_commands: - ray stop - ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/12530#issuecomment-738122979, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCRZZPWS3XTW7XGCA75MRLSS64YLANCNFSM4UI6GXYA .

0reactions

moshewecommented, Jan 8, 2021

Would you like me to make a PR to change the configuration YAML template?

Top Results From Across the Web

Subnets for your VPC - Amazon Virtual Private Cloud

A subnet is a range of IP addresses in your VPC. You can launch AWS resources, such as EC2 instances, into a specific...

VPC Configuration - eksctl

You can use an existing VPC by supplying private and/or public subnets using the --vpc-private-subnets and --vpc-public-subnets flags. It is up to you...

Creating our VPC Module - DevOps with Terraform - CloudCasts

In our case, we'll build a VPC with 2 subnets per availability zone. One subnet will be a "private" subnet, and the second...

Create public and private subnets in AWS VPC to ... - YouTube

Deploying containers into a VPC becomes more secure by creating them in a private subnet. This means they can't automatically be accessed ...

Terraform | Create a VPC, subnets and more… | by Ali Atakan

2. Create “vars.tf”. All variables will be in this file. · 3. Create “provider.tf”. All infrastructure will be on the AWS. · 6....