Ignore missing submodules?
See original GitHub issueProposed change
Don’t fail the build if a submodule is missing.
Example of failure (used to work, broke once a student removed their personal repo): https://mybinder.org/v2/gh/mdeff/ntds_2018/outputs?urlpath=lab.
Alternative options
Update the repository to remove the now missing submodules. But that has a maintenance cost and breaks the intent of preserving the original state of a repository for reproducibility.
Who would use this feature?
People who freeze (archive) repositories for the sake of reproducibility. An old repo might depend on submodules that are not available anymore. This shouldn’t completely prevent people from building a container and running the code.
Downside: this is kind of allowing a build with missing dependencies. The problem is however more severe as github repositories are deleted more often than pypi or conda packages. I would actually even proceed with missing pypi or conda packages after emitting a warning (which should ideally be made more visible than in the build log).
(Out-of-topic, but a way to be notified of binder build failures would be great. Checking manually that it still works is sub-optimal.)
How much effort will adding it take?
Easy. Check the return value of git submodule update --init --recursive
, emit a warning if non-zero, and move on with the build.
Who can do this work?
Anybody with a shallow understanding of the codebase.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
One more data point in favor of being less strict about “missing” submodules. DataLad (http://handbook.datalad.org/) uses the submodule mechanism to specific subdataset component/dependencies. Scientific datasets, e.g. https://github.com/psychoinformatics-de/studyforrest-data use this to link all components in a single toplevel repository (that is the most useful entrypoint for demos). However, not all dataset components can have the same level of access (think personal data in a neuroimaging study), hence some dataset components will be inaccessible to a public binder instance. However, they are not missing or invalid either.
Perhaps another more clear way would be to allow repo2docker to not initialise certain submodules. I.e. a configuration file to determine which sub-modules should be initialised, and optionally whether that initialisation would be allowed to fail, say in yaml:
I have a repo where I don’t want to download the submodule.