a binder deployment with authentication and persistent storage
See original GitHub issueCurrently we are working on a binder deployment with authentication and persistent storage enabled and with a user interface in JupyterHub home page, where users can manage their repositories/projects.
For this purpose we have now a deployment running on https://notebooks-test.gesis.org/jupyter/. When you first login, you will see the JupyterHub home page (https://notebooks-test.gesis.org/jupyter/hub/home) with 2 parts: “Your projects” table and the classical binder form with some parts hidden:
Binder is running under https://notebooks-test.gesis.org/jupyter/services/binder/ and you can also use it but in this deployment the idea is that you don’t need to use it directly.
How it works
Firstly some preliminary information:
- Each user can start 1 server at a time (named servers are not activated)
- Each user gets 1 persistent volume and it is mounted on
/home/jovyan
Binder form
It is the classical form with ‘share url’ and ‘badge url’ parts are hidden. And it has 1 limitation: branch/tag/commit field is readonly and always “master”. When user launches a repo via form:
- always the latest version of the repo is built (last commit in master branch) and server is started with this image
nbgitpuller
is used to pull the code under a sub directory/home/jovyan/{repo_dir}
.repo_dir
is generated by using provider name, user/org name and repo name. And server is started on that sub directory (you can start a new terminal and there you can list all directories of projects).nbgitpuller
is not executed for the default repo (gesiscss/data_science_image
).- each new launched repo is added into “Your Projects” table. This list is saved in
state
field ofSpawners
table and only last 10 launched repos are saved.
In short, binder form is used to create a new project and update it from remote.
Your Projects
When first login, user has there only the default repo (gesiscss/data_science_image
). Each repo which is built and launched via binder form is added in this table and user can re-start that repository by using the start buttons on each row. When user clicks on a start button in the table:
- A server started by using the image (commit) that user last time worked with
- Right now it is not working but we want to skip
nbgitpuller
command execution on server start when server is started from projects table, so that user can continue working on where they left. We can do this by passing an option to spawner (I think this is very related to https://github.com/jupyterhub/binderhub/issues/712) - We are also thinking about having a
delete
button in the actions of table which removes the repository from the table and deletes the folder of the repo in user’s persistent volume. Right now we have the button in the actions column but it doesn’t do anything.
In short, “Your Projects” table is used to continue working on a repo (when you don’t want o update the image or code base from remote).
Limitations and missing parts summary
nbgitpuller
must be installed in user images, right now we use appendix to ensure its installation (maybe it can be added intorepo2docker
defaults)- Users can start a new project only from master (by using the binder form), they can’t start to work on a repo from previous version/commit of it
- Server start from table also executes nbgitpuller
- Delete button doesn’t do anything
- Name generation of sub directories of each repo/project can be done better
Where to find helm config and custom templates
- Helm config file for this deployment: https://github.com/gesiscss/orc/blob/binderinjhubgh/jupyterhub/config_test.yaml
- And you can find the customised
KubeSpawner
here: https://github.com/gesiscss/orc/blob/binderinjhubgh/jupyterhub/config_test.yaml#L170-L227 - Templates for JupyterHub (
home.html
is jupyterhub home page): https://github.com/gesiscss/orc/tree/binderinjhubgh/jupyterhub/docker/k8s_hub/templates
https://notebooks-test.gesis.org/jupyter/ uses github authenticator and everybody is welcome to login and try it out (it is just a test instance and will be deleted again). We really would like to get your feedback about what we have done so far. Probably most important question is if we are on the right track to accomplish what we want. And finally we are aware that there are a lot to improve for user interface.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:10
- Comments:12 (3 by maintainers)
Top GitHub Comments
I am closing this issue. We can continue discussing this on https://discourse.jupyter.org/t/a-persistent-binderhub-deployment/2865.
Thanks @arnim But in our case we want persistent storage. We got it working by using these ideas here : https://discourse.jupyter.org/t/mounting-server-data-on-each-users-pod/641/4 We have a nfs storage mounted on each node to centralize the data administration and avoid duplication : https://github.com/neurolibre/neurolibre-binderhub/issues/18 We were also thinking to use an
initContainer
instead of putting repo2data into the config file. This has the advantage of making the process of downloading the data (if needed) more independent (running in a separate container instead).