[documentation] Document deployment on existing AWS EKS cluster
See original GitHub issueRelated to #935.
To test and document how to deploy to an existing (“local”) EKS cluster, I ran through the following steps:
Use (create) base EKS cluster
To get a functioning EKS cluster up and running quickly, I created a cluster and web app based on this tutorial. This cluster is running in it’s own VPC with 3 subnets (each in it’s own AZ) and there are no node groups. A scenario like seemed like a good place to start from the perspective of an incoming user.
Once this EKS cluster is up, there are still a handful of steps that seem to be required before QHub can be deployed to it:
- Ensure that the subnets are allowed to “automatically assign public IP addresses to instances launched into it” otherwise node group can’t be launched
- Create
general
,user
andworker
node groups- Attach
Node IAM Role
with specific permissions (copied from existing role from previous qhub deployment): - Configure node group being mindful of instance size, attached block storage size and auto-scaling features.
- Attach
I’m sure there are scenarios where there already exists node groups and they can be repurposed but more broadly it would be nice to make this process a lot more streamlined. Did I overcomplicate this, or are there other ways of handling the QHub deployment without having to add these node groups explicitly?
Deploy QHub to Existing EKS Cluster
Ensure that you are using the existing cluster’s kubectl
context.
Initialize in the usual manner:
python -m qhub init aws --project eaeexisting --domain eaeexisting.qhub.dev --ci-provider github-actions --auth-provider github --auth-auto-provision --repository github.com/iameskild/eaeaws
Then update the qhub-config.yaml
file. The important keys to update are:
- Replace
provider: aws
withprovider: local
- Replace
amazon_web_services
withlocal
- And update the
node_selector
andkube_context
appropriately
- And update the
Once updated, deploy in the usual manner:
python -m qhub deploy --config qhub-config.yaml --disable-prompt --dns-provider cloudflare --dns-auto-provision
The deployment completes successfully and all the pods appear to be running (alongside the existing pods from the web app). The issue is that I can’t access the cluster from the browser:
404 page not found
When examining the print statement from the deployment more, you can see that the cluster doesn’t have an IP address:
[terraform]: ingress_jupyter = {
[terraform]: "hostname" = "aea1abf087211438cbf9e44ef5fb64c3-197330438.us-east-2.elb.amazonaws.com"
[terraform]: "ip" = ""
[terraform]:
qhub-config.yaml
project_name: eaeexisting
provider: local
domain: eaeexisting.qhub.dev
certificate:
type: self-signed
security:
authentication:
type: GitHub
config:
client_id:
client_secret:
oauth_callback_url: https://eaeexisting.qhub.dev/hub/oauth_callback
users:
iameskild:
uid: 1000
primary_group: admin
secondary_groups:
- users
groups:
users:
gid: 100
admin:
gid: 101
default_images:
jupyterhub: quansight/qhub-jupyterhub:v0.3.13
jupyterlab: quansight/qhub-jupyterlab:v0.3.13
dask_worker: quansight/qhub-dask-worker:v0.3.13
dask_gateway: quansight/qhub-dask-gateway:v0.3.13
conda_store: quansight/qhub-conda-store:v0.3.13
storage:
conda_store: 60Gi
shared_filesystem: 100Gi
theme:
jupyterhub:
hub_title: QHub - eaeexisting
hub_subtitle: Autoscaling Compute Environment on Amazon Web Services
welcome: Welcome to eaeexisting.qhub.dev. It is maintained by <a href="http://quansight.com">Quansight
staff</a>. The hub's configuration is stored in a github repository based on
<a href="https://github.com/Quansight/qhub/">https://github.com/Quansight/qhub/</a>.
To provide feedback and report any technical problems, please use the <a href="https://github.com/Quansight/qhub/issues">github
issue tracker</a>.
logo: /hub/custom/images/jupyter_qhub_logo.svg
primary_color: '#4f4173'
secondary_color: '#957da6'
accent_color: '#32C574'
text_color: '#111111'
h1_color: '#652e8e'
h2_color: '#652e8e'
monitoring:
enabled: true
cdsdashboards:
enabled: true
cds_hide_user_named_servers: true
cds_hide_user_dashboard_servers: false
ci_cd:
type: github-actions
branch: main
terraform_state:
type: remote
namespace: dev
local:
kube_context: arn:aws:eks:us-east-2:892486800165:cluster/eaeeks
node_selectors:
general:
key: eks.amazonaws.com/nodegroup
value: general
user:
key: eks.amazonaws.com/nodegroup
value: user
worker:
key: eks.amazonaws.com/nodegroup
value: worker
profiles:
jupyterlab:
- display_name: Small Instance
description: Stable environment with 1 cpu / 4 GB ram
default: true
kubespawner_override:
cpu_limit: 1
cpu_guarantee: 0.75
mem_limit: 4G
mem_guarantee: 2.5G
image: quansight/qhub-jupyterlab:v0.3.13
- display_name: Medium Instance
description: Stable environment with 2 cpu / 8 GB ram
kubespawner_override:
cpu_limit: 2
cpu_guarantee: 1.5
mem_limit: 8G
mem_guarantee: 5G
image: quansight/qhub-jupyterlab:v0.3.13
dask_worker:
Small Worker:
worker_cores_limit: 1
worker_cores: 0.75
worker_memory_limit: 4G
worker_memory: 2.5G
worker_threads: 1
image: quansight/qhub-dask-worker:v0.3.13
Medium Worker:
worker_cores_limit: 2
worker_cores: 1.5
worker_memory_limit: 8G
worker_memory: 5G
worker_threads: 2
image: quansight/qhub-dask-worker:v0.3.13
environments:
environment-dask.yaml:
name: dask
channels:
- conda-forge
dependencies:
- python
- ipykernel
- ipywidgets
- qhub-dask ==0.3.13
- python-graphviz
- numpy
- numba
- pandas
environment-dashboard.yaml:
name: dashboard
channels:
- conda-forge
dependencies:
- python==3.9.7
- ipykernel==6.4.1
- ipywidgets==7.6.5
- qhub-dask==0.3.13
- param==1.11.1
- python-graphviz==0.17
- matplotlib==3.4.3
- panel==0.12.4
- voila==0.2.16
- streamlit==1.0.0
- dash==2.0.0
- cdsdashboards-singleuser==0.5.7
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
You can work around that, providing the DNS records manually in the namespace right? by providing the certificate’s secrets… (I am not sure)
Now that I think of it, this is most likely caused by the fact that this existing web app already has an
EXTERNAL-IP
set. I will attempt this again with an existing cluster that doesn’t already have a public facing IP/ingress.