Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[documentation] Document deployment on existing AWS EKS cluster

See original GitHub issue

Related to #935.

To test and document how to deploy to an existing (“local”) EKS cluster, I ran through the following steps:

Use (create) base EKS cluster

To get a functioning EKS cluster up and running quickly, I created a cluster and web app based on this tutorial. This cluster is running in it’s own VPC with 3 subnets (each in it’s own AZ) and there are no node groups. A scenario like seemed like a good place to start from the perspective of an incoming user.

Once this EKS cluster is up, there are still a handful of steps that seem to be required before QHub can be deployed to it:

Ensure that the subnets are allowed to “automatically assign public IP addresses to instances launched into it” otherwise node group can’t be launched
Create general, user and worker node groups
- Attach Node IAM Role with specific permissions (copied from existing role from previous qhub deployment):
- Configure node group being mindful of instance size, attached block storage size and auto-scaling features.

I’m sure there are scenarios where there already exists node groups and they can be repurposed but more broadly it would be nice to make this process a lot more streamlined. Did I overcomplicate this, or are there other ways of handling the QHub deployment without having to add these node groups explicitly?

Deploy QHub to Existing EKS Cluster

Ensure that you are using the existing cluster’s kubectl context.

Initialize in the usual manner:

python -m qhub init aws --project eaeexisting --domain eaeexisting.qhub.dev --ci-provider github-actions --auth-provider github --auth-auto-provision --repository github.com/iameskild/eaeaws

Then update the qhub-config.yaml file. The important keys to update are:

Replace provider: aws with provider: local
Replace amazon_web_services with local
- And update the node_selector and kube_context appropriately

Once updated, deploy in the usual manner:

python -m qhub deploy --config qhub-config.yaml --disable-prompt --dns-provider cloudflare --dns-auto-provision

The deployment completes successfully and all the pods appear to be running (alongside the existing pods from the web app). The issue is that I can’t access the cluster from the browser:

404 page not found

When examining the print statement from the deployment more, you can see that the cluster doesn’t have an IP address:

[terraform]: ingress_jupyter = {
[terraform]:   "hostname" = "aea1abf087211438cbf9e44ef5fb64c3-197330438.us-east-2.elb.amazonaws.com"
[terraform]:   "ip" = ""
[terraform]:

`qhub-config.yaml`

project_name: eaeexisting
provider: local
domain: eaeexisting.qhub.dev
certificate:
  type: self-signed
security:
  authentication:
    type: GitHub
    config:
      client_id: 
      client_secret:
      oauth_callback_url: https://eaeexisting.qhub.dev/hub/oauth_callback
  users:
    iameskild:
      uid: 1000
      primary_group: admin
      secondary_groups:
      - users
  groups:
    users:
      gid: 100
    admin:
      gid: 101
default_images:
  jupyterhub: quansight/qhub-jupyterhub:v0.3.13
  jupyterlab: quansight/qhub-jupyterlab:v0.3.13
  dask_worker: quansight/qhub-dask-worker:v0.3.13
  dask_gateway: quansight/qhub-dask-gateway:v0.3.13
  conda_store: quansight/qhub-conda-store:v0.3.13
storage:
  conda_store: 60Gi
  shared_filesystem: 100Gi
theme:
  jupyterhub:
    hub_title: QHub - eaeexisting
    hub_subtitle: Autoscaling Compute Environment on Amazon Web Services
    welcome: Welcome to eaeexisting.qhub.dev. It is maintained by <a href="http://quansight.com">Quansight
      staff</a>. The hub's configuration is stored in a github repository based on
      <a href="https://github.com/Quansight/qhub/">https://github.com/Quansight/qhub/</a>.
      To provide feedback and report any technical problems, please use the <a href="https://github.com/Quansight/qhub/issues">github
      issue tracker</a>.
    logo: /hub/custom/images/jupyter_qhub_logo.svg
    primary_color: '#4f4173'
    secondary_color: '#957da6'
    accent_color: '#32C574'
    text_color: '#111111'
    h1_color: '#652e8e'
    h2_color: '#652e8e'
monitoring:
  enabled: true
cdsdashboards:
  enabled: true
  cds_hide_user_named_servers: true
  cds_hide_user_dashboard_servers: false
ci_cd:
  type: github-actions
  branch: main
terraform_state:
  type: remote
namespace: dev
local:
  kube_context: arn:aws:eks:us-east-2:892486800165:cluster/eaeeks
  node_selectors:
    general:
      key: eks.amazonaws.com/nodegroup
      value: general
    user:
      key: eks.amazonaws.com/nodegroup
      value: user
    worker:
      key: eks.amazonaws.com/nodegroup
      value: worker
profiles:
  jupyterlab:
  - display_name: Small Instance
    description: Stable environment with 1 cpu / 4 GB ram
    default: true
    kubespawner_override:
      cpu_limit: 1
      cpu_guarantee: 0.75
      mem_limit: 4G
      mem_guarantee: 2.5G
      image: quansight/qhub-jupyterlab:v0.3.13
  - display_name: Medium Instance
    description: Stable environment with 2 cpu / 8 GB ram
    kubespawner_override:
      cpu_limit: 2
      cpu_guarantee: 1.5
      mem_limit: 8G
      mem_guarantee: 5G
      image: quansight/qhub-jupyterlab:v0.3.13
  dask_worker:
    Small Worker:
      worker_cores_limit: 1
      worker_cores: 0.75
      worker_memory_limit: 4G
      worker_memory: 2.5G
      worker_threads: 1
      image: quansight/qhub-dask-worker:v0.3.13
    Medium Worker:
      worker_cores_limit: 2
      worker_cores: 1.5
      worker_memory_limit: 8G
      worker_memory: 5G
      worker_threads: 2
      image: quansight/qhub-dask-worker:v0.3.13
environments:
  environment-dask.yaml:
    name: dask
    channels:
    - conda-forge
    dependencies:
    - python
    - ipykernel
    - ipywidgets
    - qhub-dask ==0.3.13
    - python-graphviz
    - numpy
    - numba
    - pandas
  environment-dashboard.yaml:
    name: dashboard
    channels:
    - conda-forge
    dependencies:
    - python==3.9.7
    - ipykernel==6.4.1
    - ipywidgets==7.6.5
    - qhub-dask==0.3.13
    - param==1.11.1
    - python-graphviz==0.17
    - matplotlib==3.4.3
    - panel==0.12.4
    - voila==0.2.16
    - streamlit==1.0.0
    - dash==2.0.0
    - cdsdashboards-singleuser==0.5.7

Issue Analytics

State:
Created 2 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

viniciusdccommented, Nov 26, 2021

Hey @viniciusdc, how did you provision the DNS? From looking reading through the code base, it appears that when deploying to a local (existing) cluster, the update_record for CloudFlare is skipped altogether:

https://github.com/Quansight/qhub/blob/c0d08bbcc08816475bf26466e2d64f9daf03164e/qhub/deploy.py#L108-L119

And that what I see when I deploy:
INFO:qhub.deploy:Couldn't update the DNS record for cloud provider: local
This explains why I can’t access the cluster.

You can work around that, providing the DNS records manually in the namespace right? by providing the certificate’s secrets… (I am not sure)

1reaction

iameskildcommented, Nov 24, 2021

Now that I think of it, this is most likely caused by the fact that this existing web app already has an EXTERNAL-IP set. I will attempt this again with an existing cluster that doesn’t already have a public facing IP/ingress.