Add the ability to specify a different repository for the *environment* of a repo
See original GitHub issueOver the years, we’ve felt a tension between flexibility and speed in Binder launches. This is most-obvious in repositories that are often updated in their content, but not in their environment. We’ve recommended various workarounds for this (e.g., using nbgitpuller to separate content from environment), but many folks spend a lot of extra time waiting for a binder session to launch just because they’ve changed a typo in a notebook somewhere.
I think one way that we could get around this could be to allow for users to specify an environment repository in their code. This could behave like this:
in runtime.txt
:
environment-<URL to git repository>
which would trigger the following behavior:
- All other configuration files in the current repository are ignored
- repo2docker is called on the repo specified in the
runtime.txt
file - When the session begins, all of the files in the environment repo are removed, and replaced by the ones in the current repo
In this way, people could explicitly tag a different repository as an environment repository and thus save a lot of time in re-building etc. They could pin the URL of the target repository to a specific hash/branch/etc just like a normal binder repo, so best-practices in reproducibility will still function.
This could:
- Save our cloud costs, because fewer unique images would end up being built
- Save launch times, because fewer unique images == less docker pulls and repo2docker builds == less launch time
- Be a way to support a “default community image” that many people can use, which would result in much faster launch times (e.g., just tell people "put
environment-https://github.com/jupyterhub/community-environment
in yourruntime.txt
file)
What do people think about this?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:14 (8 by maintainers)
Top GitHub Comments
One thing we have learnt with repo2docker: every time we invent a special config file we regret it. This means we need to find something that is already out there (or send someone out to the world to establish our new idea as a “standard”) before building it into repo2docker.
In what kinds of situation do you end up wanting to use a default environment with your content? I think this is an interesting question to explore and get answers to in order to help shape this feature.
There is a thread with a lot of context, ideas and discussion about providing a default environment (decided by the admin of the BinderHub) into which content can be pulled via nbgitpuller in https://github.com/jupyterhub/mybinder.org-deploy/issues/1474 I think that issue is the best place to read and add to.
If you want to use your own “base environment” already today take a look at https://github.com/betatim/kaggle-binder/tree/master#how-can-i-use-this for an example which shows how to do that (no new code needed).
A quick win regarding discoverability would be to work on https://mybinder.readthedocs.io/en/latest/howto/external_binder_setup.html which is a tutorial that suggests the nbgitpuller + env idea. Also interesting when it comes to documentation changes to improve discoverability is an issue about merging the various Binder docs.