question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some high-level questions around usage of kubespawner

See original GitHub issue

@yuvipanda Thanks for all your work on kubespawner. I’ve started experimenting with running jupyterhub on kubernetes, largely thanks to this spawner, but I wanted to get some guidance around my use-cases / workflow from someone a bit more seasoned in this technology. I’m structuring these as a series of high-level questions, where your input would be be much appreciated. For ease of explanation, I may refer to the rough sketch below lower down.

image

My efforts so far, for context: I was working through the data-8/jupyterhub-k8s implementation, which I think bases itself off your work, since it’s structure in a chart form (fro helm) is the easiest to work with, compared to some of the other implementations I’ve found out there.

I modified that set-up slightly to handle gitlab authentication (rather than google), which worked OK, but I wasn’t able to get the spawning of their large user image (>5GB), based on this Dockerfile and their hub image to work. It was constantly stuck in a Waiting: ContainerCreating state and would then try to re-spawn itself. I haven’t figured out what the problem is, but there appears to be plenty of space on the cluster. I’m using v1.51 of kubernetes on GCE.

Anyway, I ended up getting things working using instead the hub image (dockerfile below), a variation of the data-8 one, in conjunction with your yuvipanda/simple-singleuser:v1 user image.

FROM jupyterhub/jupyterhub-onbuild:0.7.1
# Install kubespawner and its dependencies
RUN /opt/conda/bin/pip install \
    oauthenticator==0.5.* \
    git+https://github.com/derrickmar/kubespawner \
    git+https://github.com/yuvipanda/jupyterhub-nginx-chp.git
ADD jupyterhub_config.py /srv/jupyterhub_config.py
ADD userlist /srv/userlist
WORKDIR /srv/jupyterhub
EXPOSE 8081
CMD jupyterhub --config /srv/jupyterhub_config.py --no-ssl

This was able to spawn new user persistent volumes, bind them to PVCs and obviously spawn user jupyter notebook servers, which could be stopped/started and re-use the same PV. My initial tests as to whether new files/notebooks were getting persisted on the PV were failing, since I wasn’t saving them under /home, which is where the binding to the volume is happening.

i. user management / userid - After various aborted attempts to get the larger data-8 user image working, and where user PVs weren’t deleted. I noticed that the userid appended to username for naming the PV incremented up, but it wasn’t clear where this numbering logic was coming from, as it wasn’t a env variable in any of the manifests. Is this some fail-safe of some sort?

Currently, I’m using a whitelist userlist for users (see code from jupyterhub_config.py) below, and these correspond with my users’ gitlab logins that I’m authenticating against. However, it’s probably not a clean solution. I see you are working on another approach on the fsgroup and just wanted to get a better understanding around the context of this solution?

# Whitlelist users and admins
c.Authenticator.whitelist = whitelist = set()
c.Authenticator.admin_users = admin = set()
c.JupyterHub.admin_access = True
pwd = os.path.dirname(__file__)
with open(os.path.join(pwd, 'userlist')) as f:
    for line in f:
        if not line:
            continue
        parts = line.split()
        name = parts[0]
        whitelist.add(name)
        if len(parts) > 1 and parts[1] == 'admin':
            admin.add(name)

ii. possibility for interchangeable images - I find the current default set-up with Jupyterhub allowing for spawning a single image very limiting. I can see from #14 that you are considering extending functionality in the kubespawner to allow for an image to be selected. @minrk was able to confirm over here that it could be possible to pass this image selection programmatically via the jupyterhub API, although I’m not sure, as per this issue, as to whether the hub API will work in a kubernetes context.

You pointed to an implementation by Google here. It’s not clear to me where they are deriving their list of available images. How do you think something like this should work?

As per the sketch up top, I’m looking to handle a set-up where users have various private/shared repos (marked 1 above in sketch), from which docker images are generated and stored in a registry (2 above). Then my users (3 above) would be able to spawn a compute environment for their chosen repo and have it spawned in kubernetes (4 above), with the possibility, from 5 above, to have the repo cloned (maybe leveraging gitRepo) and for any incrimental work performed on it, while on the notebook server, persisted (6).

iii. multiple simultaneous servers per user based on different images - As far as I understand, it’s not possible with jupyterhub to presently allow a user to have multiples instances of a notebook server, each running a different image? Do the tools exist within kubernetes to potentially facilitate this? Thinking out loud, could this be facilitated by having multiple smaller persistent volumes for a user, based on the repo from which the server image is derived? Or maybe this could be achieved within a single PV, by using the subPath functionality?

c.KubeSpawner.volumes = [
    {
        'name': 'volume-{username}-{repo-namespace}-{repo-name}',
        'persistentVolumeClaim': {
            'claimName': 'claim-{username}-{repo-namespace}-{repo-name}'
        }
    }
]

iv. ideas around version-control - Given the various advantages derived from using kubernetes to host jupyter, I would be curious if you had some thoughts around whether kubernetes also potentially makes it easier to manage version control for notebooks and other files created while in a user works in a notebook server environment. Perhaps something like preStop hooks could be used to commit and push changes prior to a container shutting down.

Even facilitating a user to be able to run git commands from a notebook server terminal … and have SSH keys back to the version-control system handled via the kubernetes secrets/config maps might be a start. Have you seen any implementations solving this?

Thanks for your patience in reading through this!

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:9
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
yuvipandacommented, Jan 14, 2017

On Thu, Jan 5, 2017 at 4:26 PM, Analect notifications@github.com wrote:

@yuvipanda https://github.com/yuvipanda Thanks for all your work on kubespawner. I’ve started experimenting with running jupyterhub on kubernetes, largely thanks to this spawner, but I wanted to get some guidance around my use-cases / workflow from someone a bit more seasoned in this technology. I’m structuring these as a series of high-level questions, where your input would be be much appreciated. For ease of explanation, I may refer to the rough sketch below lower down.

[image: image] https://cloud.githubusercontent.com/assets/4063815/21677128/9bc79b9c-d330-11e6-85a5-f8602b0bbff1.png

This is an awesome sketch! May I ask how you created it?

https://cloud.githubusercontent.com/assets/4063815/21677128/9bc79b9c-d330-11e6-85a5-f8602b0bbff1.png

My efforts so far, for context: I was working through the data-8/jupyterhub-k8s https://github.com/data-8/jupyterhub-k8s implementation, which I think bases itself off your work, since it’s structure in a chart form (fro helm) is the easiest to work with, compared to some of the other implementations I’ve found out there.

I modified that set-up slightly to handle gitlab authentication (rather than google), which worked OK, but I wasn’t able to get the spawning of their large user image (>5GB), based on this Dockerfile https://github.com/data-8/jupyterhub-k8s/blob/master/user/Dockerfile and their hub image https://github.com/data-8/jupyterhub-k8s/blob/master/hub/Dockerfile to work. It was constantly stuck in a Waiting: ContainerCreating state and would then try to re-spawn itself. I haven’t figured out what the problem is, but there appears to be plenty of space on the cluster. I’m using v1.51 of kubernetes on GCE.

Anyway, I ended up getting things working using instead the hub image (dockerfile below), a variation of the data-8 one, in conjunction with your yuvipanda/simple-singleuser:v1 https://github.com/yuvipanda/jupyterhub-simplest-k8s/blob/master/singleuser/Dockerfile user image.

FROM jupyterhub/jupyterhub-onbuild:0.7.1

Install kubespawner and its dependencies

RUN /opt/conda/bin/pip install
oauthenticator==0.5.*
git+https://github.com/derrickmar/kubespawner
git+https://github.com/yuvipanda/jupyterhub-nginx-chp.git ADD jupyterhub_config.py /srv/jupyterhub_config.py ADD userlist /srv/userlist WORKDIR /srv/jupyterhub EXPOSE 8081 CMD jupyterhub --config /srv/jupyterhub_config.py --no-ssl

This was able to spawn new user persistent volumes, bind them to PVCs and obviously spawn user jupyter notebook servers, which could be stopped/started and re-use the same PV. My initial tests as to whether new files/notebooks were getting persisted on the PV were failing, since I wasn’t saving them under /home, which is where the binding to the volume https://github.com/data-8/jupyterhub-k8s/blob/master/hub/jupyterhub_config.py#L33-L47 is happening.

Awesome! In the last week or so, I’ve spent a lot of time generalizing the helm configuration a lot more, and it should be more widely usable (with multiple authenticators support) soon. We’re deploying it for UC Berkeley’s class starting Monday, so will have more time to actually write documentation after that. I intend to get it included in github.com/kubernetes/charts eventually, to make it an officially supported way of installing JupyterHub.

i. user management / userid - After various aborted attempts to get the larger data-8 user image working, and where user PVs weren’t deleted. I noticed that the userid appended to username for naming the PV incremented up, but it wasn’t clear where this numbering logic was coming from, as it wasn’t a env variable in any of the manifests. Is this some fail-safe of some sort?

Currently, I’m using a whitelist userlist for users (see code from jupyterhub_config.py) below, and these correspond with my users’ gitlab logins that I’m authenticating against. However, it’s probably not a clean solution. I see you are working on another approach on the fsgroup https://github.com/jupyterhub/kubespawner/commit/13edc761448f21b23f13d5b26b705b41c83b8c15 and just wanted to get a better understanding around the context of this solution?

Whitlelist users and admins

c.Authenticator.whitelist = whitelist = set() c.Authenticator.admin_users = admin = set() c.JupyterHub.admin_access = True pwd = os.path.dirname(file) with open(os.path.join(pwd, ‘userlist’)) as f: for line in f: if not line: continue parts = line.split() name = parts[0] whitelist.add(name) if len(parts) > 1 and parts[1] == ‘admin’: admin.add(name)

There are multiple types of users / userids, which is confusing!

  1. The JupyterHub user id - this is simply the id of the entry for the user in the sqlite table. This is pretty useless for everything other than as unique identifiers. This is used in the pod name to make sure no two users’ pods have the same name - since we ‘normalize’ the username to a subset of ascii, there are plenty of cases where two pods can have the same names if only username is used. Hence we append ID to it. There is pretty much no other external use of the id anywhere.
  2. The unix user as which the notebook process runs. This is completely separate from and unrelated to (1). This is specified in the Dockerfile (as USER) and overrideable as c.KubeSpawner.singleuser_uid. These users are what is used for permission checks (writing things to persistent storage for example - this is what was causing permission errors when writing to the mounted persistent volume). fsgroup is related to this as well - it should be set to a group that this unix user is part of so that singleuser servers can mount and write to persistent volumes properly. In Kubernetes, this should ideally just always be one unix user that’s the same for all users - they’re all contained in containers, so this is ok.

As for deleting PVs - if you delete PVs you lose the data in them (since dynamically provisioned PVs always have reclaimPolicy: Delete). Hence it is a manual operation that is not automated at all - you have to delete the linked PVC manually, which will delete the PV (and lose your data)

ii. possibility for interchangeable images - I find the current default set-up with Jupyterhub allowing for spawning a single image very limiting. I can see from #14 https://github.com/jupyterhub/kubespawner/issues/14 that you are considering extending functionality in the kubespawner to allow for an image to be selected. @minrk https://github.com/minrk was able to confirm over here https://github.com/jupyterhub/jupyterhub-deploy-docker/issues/25#issuecomment-260932976 that it could be possible to pass this image selection programmatically via the jupyterhub API, although I’m not sure, as per this https://github.com/jupyterhub/jupyterhub/issues/891 issue, as to whether the hub API will work in a kubernetes context.

You pointed to an implementation by Google here https://github.com/sveesible/jupyterhub-kubernetes-spawner/blob/master/kubernetespawner/spawner.py#L174-L214. It’s not clear to me where they are deriving their list of available images. How do you think something like this should work?

As per the sketch up top, I’m looking to handle a set-up where users have various private/shared repos (marked 1 above in sketch), from which docker images are generated and stored in a registry (2 above). Then my users (3 above) would be able to spawn a compute environment for their chosen repo and have it spawned in kubernetes (4 above), with the possibility, from 5 above, to have the repo cloned (maybe leveraging gitRepo http://kubernetes.io/docs/user-guide/volumes/#gitrepo) and for any incrimental work performed on it, while on the notebook server, persisted (6).

This can be done currently with https://jupyterhub.readthedocs.io/en/latest/spawners.html#spawner-options-form. Are you thinking of the list of images as being static (ie specified by administrator) or dynamic? If dynamic it might be a little more difficult, but not impossible. I see you’ve already dug into this on Gitter - would love to see your solution so we can make it easier in KubeSpawner 😃

iii. multiple simultaneous servers per user based on different images - As far as I understand, it’s not possible with jupyterhub to presently allow a user to have multiples instances of a notebook server, each running a different image? Do the tools exist within kubernetes to potentially facilitate this? Thinking out loud, could this be facilitated by having multiple smaller persistent volumes for a user, based on the repo from which the server image is derived? Or maybe this could be achieved within a single PV, by using the subPath http://kubernetes.io/docs/user-guide/volumes/#using-subpath functionality?

c.KubeSpawner.volumes = [ { ‘name’: ‘volume-{username}-{repo-namespace}-{repo-name}’, ‘persistentVolumeClaim’: { ‘claimName’: ‘claim-{username}-{repo-namespace}-{repo-name}’ } } ]

This is a little more difficult from JupyterHub but active work is being done on this right now - follow https://github.com/jupyterhub/jupyterhub/issues/766 for more details!

iv. ideas around version-control - Given the various advantages derived from using kubernetes to host jupyter, I would be curious if you had some thoughts around whether kubernetes also potentially makes it easier to manage version control for notebooks and other files created while in a user works in a notebook server environment. Perhaps something like preStop http://kubernetes.io/docs/user-guide/container-environment/#container-hooks hooks could be used to commit and push changes prior to a container shutting down.

Even facilitating a user to be able to run git commands from a notebook server terminal … and have SSH keys back to the version-control system handled via the kubernetes secrets/config maps might be a start. Have you seen any implementations solving this?

Thanks for your patience in reading through this!

If you are using GitHub for authentication, then we could possibly do something like generate a personal access token when the user logs in and then put it in an appropriate place on the notebook container, thus allowing users to pull / push natively. I think that’s far better than wrapping git with some magic, which in my experience ends badly always. In https://github.com/yuvipanda/paws/blob/master/hub/jupyterhub_config.py#L41 I pass extra generated parameters into the single-user notebook from the hub, and we could do something similar here.

Action items from here are:

  1. Play with getting GitHub personal access token into environment variables / proper locations on disk so people can push / pull from repos
  2. Expand documentation on what ‘users’ are and how the various kinds of ‘users’ are used
  3. See if you need any follow up help on the docker image selection with options form thing
  4. Continue making the helm config configurable enough for general use.

Feel free to ask follow up questions here or on gitter! Looking forward to seeing what cool things you are doing!

– Yuvi Panda T http://yuvi.in/blog

1reaction
yuvipandacommented, Jan 6, 2017

\o/ Thank you for your well thought out questions! I want to acknowledge I’ve seen them, but am travelling presently - will respond in bits and pieces!

On Thu, Jan 5, 2017 at 2:56 AM, Analect notifications@github.com wrote:

@yuvipanda https://github.com/yuvipanda Thanks for all your work on kubespawner. I’ve started experimenting with running jupyterhub on kubernetes, largely thanks to this spawner, but I wanted to get some guidance around my use-cases / workflow from someone a bit more seasoned in this technology. I’m structuring these as a series of high-level questions, where your input would be be much appreciated. For ease of explanation, I may refer to the rough sketch below lower down.

[image: image] https://cloud.githubusercontent.com/assets/4063815/21677128/9bc79b9c-d330-11e6-85a5-f8602b0bbff1.png

My efforts so far, for context: I was working through the data-8/jupyterhub-k8s https://github.com/data-8/jupyterhub-k8s implementation, which I think bases itself off your work, since it’s structure in a chart form (fro helm) is the easiest to work with, compared to some of the other implementations I’ve found out there.

I modified that set-up slightly to handle gitlab authentication (rather than google), which worked OK, but I wasn’t able to get the spawning of their large user image (>5GB), based on this Dockerfile https://github.com/data-8/jupyterhub-k8s/blob/master/user/Dockerfile and their hub image https://github.com/data-8/jupyterhub-k8s/blob/master/hub/Dockerfile to work. It was constantly stuck in a Waiting: ContainerCreating state and would then try to re-spawn itself. I haven’t figured out what the problem is, but there appears to be plenty of space on the cluster. I’m using v1.51 of kubernetes on GCE.

Anyway, I ended up getting things working using instead the hub image (dockerfile below), a variation of the data-8 one, in conjunction with your yuvipanda/simple-singleuser:v1 https://github.com/yuvipanda/jupyterhub-simplest-k8s/blob/master/singleuser/Dockerfile user image.

FROM jupyterhub/jupyterhub-onbuild:0.7.1

Install kubespawner and its dependencies

RUN /opt/conda/bin/pip install
oauthenticator==0.5.*
git+https://github.com/derrickmar/kubespawner
git+https://github.com/yuvipanda/jupyterhub-nginx-chp.git ADD jupyterhub_config.py /srv/jupyterhub_config.py ADD userlist /srv/userlist WORKDIR /srv/jupyterhub EXPOSE 8081 CMD jupyterhub --config /srv/jupyterhub_config.py --no-ssl

This was able to spawn new user persistent volumes, bind them to PVCs and obviously spawn user jupyter notebook servers, which could be stopped/started and re-use the same PV. My initial tests as to whether new files/notebooks were getting persisted on the PV were failing, since I wasn’t saving them under /home, which is where the binding to the volume https://github.com/data-8/jupyterhub-k8s/blob/master/hub/jupyterhub_config.py#L33-L47 is happening.

i. user management / userid - After various aborted attempts to get the larger data-8 user image working, and where user PVs weren’t deleted. I noticed that the userid appended to username for naming the PV incremented up, but it wasn’t clear where this numbering logic was coming from, as it wasn’t a env variable in any of the manifests. Is this some fail-safe of some sort?

Currently, I’m using a whitelist userlist for users (see code from jupyterhub_config.py) below, and these correspond with my users’ gitlab logins that I’m authenticating against. However, it’s probably not a clean solution. I see you are working on another approach on the fsgroup https://github.com/jupyterhub/kubespawner/commit/13edc761448f21b23f13d5b26b705b41c83b8c15 and just wanted to get a better understanding around the context of this solution?

Whitlelist users and admins

c.Authenticator.whitelist = whitelist = set() c.Authenticator.admin_users = admin = set() c.JupyterHub.admin_access = True pwd = os.path.dirname(file) with open(os.path.join(pwd, ‘userlist’)) as f: for line in f: if not line: continue parts = line.split() name = parts[0] whitelist.add(name) if len(parts) > 1 and parts[1] == ‘admin’: admin.add(name)

ii. possibility for interchangeable images - I find the current default set-up with Jupyterhub allowing for spawning a single image very limiting. I can see from #14 https://github.com/jupyterhub/kubespawner/issues/14 that you are considering extending functionality in the kubespawner to allow for an image to be selected. @minrk https://github.com/minrk was able to confirm over here https://github.com/jupyterhub/jupyterhub-deploy-docker/issues/25#issuecomment-260932976 that it could be possible to pass this image selection programmatically via the jupyterhub API, although I’m not sure, as per this https://github.com/jupyterhub/jupyterhub/issues/891 issue, as to whether the hub API will work in a kubernetes context.

You pointed to an implementation by Google here https://github.com/sveesible/jupyterhub-kubernetes-spawner/blob/master/kubernetespawner/spawner.py#L174-L214. It’s not clear to me where they are deriving their list of available images. How do you think something like this should work?

As per the sketch up top, I’m looking to handle a set-up where users have various private/shared repos (marked 1 above in sketch), from which docker images are generated and stored in a registry (2 above). Then my users (3 above) would be able to spawn a compute environment for their chosen repo and have it spawned in kubernetes (4 above), with the possibility, from 5 above, to have the repo cloned (maybe leveraging gitRepo http://kubernetes.io/docs/user-guide/volumes/#gitrepo) and for any incrimental work performed on it, while on the notebook server, persisted (6).

iii. multiple simultaneous servers per user based on different images - As far as I understand, it’s not possible with jupyterhub to presently allow a user to have multiples instances of a notebook server, each running a different image? Do the tools exist within kubernetes to potentially facilitate this? Thinking out loud, could this be facilitated by having multiple smaller persistent volumes for a user, based on the repo from which the server image is derived? Or maybe this could be achieved within a single PV, by using the subPath http://kubernetes.io/docs/user-guide/volumes/#using-subpath functionality?

c.KubeSpawner.volumes = [ { ‘name’: ‘volume-{username}-{repo-namespace}-{repo-name}’, ‘persistentVolumeClaim’: { ‘claimName’: ‘claim-{username}-{repo-namespace}-{repo-name}’ } } ]

iv. ideas around version-control - Given the various advantages derived from using kubernetes to host jupyter, I would be curious if you had some thoughts around whether kubernetes also potentially makes it easier to manage version control for notebooks and other files created while in a user works in a notebook server environment. Perhaps something like preStop http://kubernetes.io/docs/user-guide/container-environment/#container-hooks hooks could be used to commit and push changes prior to a container shutting down.

Even facilitating a user to be able to run git commands from a notebook server terminal … and have SSH keys back to the version-control system handled via the kubernetes secrets/config maps might be a start. Have you seen any implementations solving this?

Thanks for your patience in reading through this!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyterhub/kubespawner/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23qmnLmR_H-oyusHajCy23r-FFKaNks5rPMxZgaJpZM4LbkgJ .

– Yuvi Panda T http://yuvi.in/blog

Read more comments on GitHub >

github_iconTop Results From Across the Web

jupyterhub_config.py - Kubespawner - Read the Docs
JupyterHub Spawner to spawn user notebooks on a Kubernetes cluster. ... in your jupyterhub_config.py file. ... A JupyterHub spawner that spawn pods in...
Read more >
jupyterhub/jupyterhub - Gitter
Ask quick dev questions about JupyterHub, the multi-user server for Jupyter notebooks. Use discourse.jupyter.org for user questions, support, and discussion.
Read more >
JupyterHub - Docker Hub
The CONTRIBUTING.md file explains how to set up a development installation, how to run the test suite, and how to contribute to documentation....
Read more >
JupyterHub Documentation - Read the Docs
explain some basic information about API tokens ... When JupyterHub uses container-based Spawners (e.g. KubeSpawner or DockerSpawner), ...
Read more >
A Deployable Cloud-based Analysis Platform for Astronomy
Apache Spark can access data stored in any object store that uses the S3 API. Some cloud providers offer object storage products that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found