Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

a binder deployment with authentication and persistent storage

See original GitHub issue

Currently we are working on a binder deployment with authentication and persistent storage enabled and with a user interface in JupyterHub home page, where users can manage their repositories/projects.

For this purpose we have now a deployment running on https://notebooks-test.gesis.org/jupyter/. When you first login, you will see the JupyterHub home page (https://notebooks-test.gesis.org/jupyter/hub/home) with 2 parts: “Your projects” table and the classical binder form with some parts hidden:

ss1

Binder is running under https://notebooks-test.gesis.org/jupyter/services/binder/ and you can also use it but in this deployment the idea is that you don’t need to use it directly.

How it works

Firstly some preliminary information:

Each user can start 1 server at a time (named servers are not activated)
Each user gets 1 persistent volume and it is mounted on /home/jovyan

Binder form

It is the classical form with ‘share url’ and ‘badge url’ parts are hidden. And it has 1 limitation: branch/tag/commit field is readonly and always “master”. When user launches a repo via form:

always the latest version of the repo is built (last commit in master branch) and server is started with this image
nbgitpuller is used to pull the code under a sub directory /home/jovyan/{repo_dir}. repo_dir is generated by using provider name, user/org name and repo name. And server is started on that sub directory (you can start a new terminal and there you can list all directories of projects). nbgitpuller is not executed for the default repo (gesiscss/data_science_image).
each new launched repo is added into “Your Projects” table. This list is saved in state field of Spawners table and only last 10 launched repos are saved.

In short, binder form is used to create a new project and update it from remote.

Your Projects

When first login, user has there only the default repo (gesiscss/data_science_image). Each repo which is built and launched via binder form is added in this table and user can re-start that repository by using the start buttons on each row. When user clicks on a start button in the table:

A server started by using the image (commit) that user last time worked with
Right now it is not working but we want to skip nbgitpuller command execution on server start when server is started from projects table, so that user can continue working on where they left. We can do this by passing an option to spawner (I think this is very related to https://github.com/jupyterhub/binderhub/issues/712)
We are also thinking about having a delete button in the actions of table which removes the repository from the table and deletes the folder of the repo in user’s persistent volume. Right now we have the button in the actions column but it doesn’t do anything.

In short, “Your Projects” table is used to continue working on a repo (when you don’t want o update the image or code base from remote).

Limitations and missing parts summary

nbgitpuller must be installed in user images, right now we use appendix to ensure its installation (maybe it can be added into repo2docker defaults)
Users can start a new project only from master (by using the binder form), they can’t start to work on a repo from previous version/commit of it
Server start from table also executes nbgitpuller
Delete button doesn’t do anything
Name generation of sub directories of each repo/project can be done better

Where to find helm config and custom templates

Helm config file for this deployment: https://github.com/gesiscss/orc/blob/binderinjhubgh/jupyterhub/config_test.yaml
And you can find the customised KubeSpawner here: https://github.com/gesiscss/orc/blob/binderinjhubgh/jupyterhub/config_test.yaml#L170-L227
Templates for JupyterHub (home.html is jupyterhub home page): https://github.com/gesiscss/orc/tree/binderinjhubgh/jupyterhub/docker/k8s_hub/templates

https://notebooks-test.gesis.org/jupyter/ uses github authenticator and everybody is welcome to login and try it out (it is just a test instance and will be deleted again). We really would like to get your feedback about what we have done so far. Probably most important question is if we are on the right track to accomplish what we want. And finally we are aware that there are a lot to improve for user interface.

Issue Analytics

State:
Created 5 years ago
Reactions:10
Comments:12 (3 by maintainers)

Top GitHub Comments

1reaction

bitnikcommented, Dec 11, 2019

I am closing this issue. We can continue discussing this on https://discourse.jupyter.org/t/a-persistent-binderhub-deployment/2865.

0reactions

ltetrelcommented, Nov 18, 2019

Thanks @arnim But in our case we want persistent storage. We got it working by using these ideas here : https://discourse.jupyter.org/t/mounting-server-data-on-each-users-pod/641/4 We have a nfs storage mounted on each node to centralize the data administration and avoid duplication : https://github.com/neurolibre/neurolibre-binderhub/issues/18 We were also thinking to use an initContainer instead of putting repo2data into the config file. This has the advantage of making the process of downloading the data (if needed) more independent (running in a separate container instead).