question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The need for speed

See original GitHub issue

This issue is about collecting ideas that could make the images produced by repo2docker smaller, faster to push/pull or faster to build.

I envision this thread to be a meta thread with lots of ideas that then either turn into PRs or get marked as “already done” or “not possible”. This way we can use it as an entry point to finding other related issues or PRs.

Why make images smaller and faster to build? From our own experience and the first results from the binder user survey it is clear that faster builds and faster launches is something people really care about.


Smaller images

A few ideas via https://jcrist.github.io/conda-docker-tips.html

  • don’t use MKL: we already use conda-forge as default -> openblas is our default, no more gains possible
  • run conda clean -afy: already implemented
  • --freeze-installed: not currently used, unsure if it would help, worth trying
  • remove additional unnecessary files: we should do this
  • use a smaller base image: not applicable as the ubuntu base we use should be present on all nodes in a BinderHub cluster -> this is “free” as it doesn’t need pulling or pushing. Double check this is actually true

Reordering build steps for faster builds

Right now we have to rebuild the whole image (from the point onwards where we copy in the contents of the repo) even if the user only changes a typo in the README. The reasoning behind this is that a requirements.txt could contain something like -e . which leads to the setup.py in the repo being executed. This in turn means the setup process could be executing “anything and depends on everything in the repo”. There is no way of knowing that the one character change in the README won’t change the build result.

However I think this is a fringe case and the common case is that people only install packages from PyPI and don’t depend on the rest of the repository. How can we make it so this common case is faster and still get the rare case right?

The obvious way to speed up builds and rebuilds is to copy only the requirements.txt into the repo, run the install step and then copy over the rest of the repository. This way a change in the README won’t break the docker layer cache, which means rebuilds are fast.

One thing we could try is to copy the requirements.txt early, run pip install -r requirements.txt wrapped in a “if this fails just continue” block, then copy the full repo, rerun pip install -r requirements.txt which will either be a no-op (if the early run succeeded) or will clean up the breakage from the first run.

We invented install.R so we could declare it a mistake to rely on anything in the repository. This means we can copy it over early. This would save a large amount of user pain because R builds are some of the slowest builds we have. (see #716)

For environment.yml I am not sure if you can install things from a local directory or not. In either case we could treat it like the requirements.txt case (try, ignore errors, retry).


Overhead from docker layers

One thing I was wondering is if an image of the same size and content that has 100 layers (say one each per file added) has more pull overhead than one that consists of only one layer. From watching docker pull it seems there are various steps that happen after a layer has been pulled (checksum, unpacking) that could be saved by reducing the number of layers.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:22 (12 by maintainers)

github_iconTop GitHub Comments

3reactions
jchesterpivotalcommented, Jun 25, 2019

@betatim from what I can tell, Cloud Native Buildpacks were touched on here: https://github.com/jupyter/repo2docker/issues/487#issuecomment-479858333

(Edit: more accurately, the pack CLI was touched on, but note that it represents only part of the buildpacks.io effort)

From an outsider’s squinting perspective, I feel like CNBs are trying to solve the same class of problems as r2d: efficient, automated, composable builds.

2reactions
manicscommented, Jun 21, 2019

If your image is on Docker Hub Microbadger gives a nice visualisation of the size of each layer, e.g. https://microbadger.com/images/jupyter/tensorflow-notebook

Read more comments on GitHub >

github_iconTop Results From Across the Web

Need for Speed Video Games - Official EA Site
Race against time, outsmart the cops, and take on weekly qualifiers to reach The Grand, Lakeshore's ultimate street racing challenge.
Read more >
The Need for Speed - Wikipedia
Road & Track Presents: The Need for Speed is a racing video game developed by EA Canada, originally known as Pioneer Productions, and...
Read more >
Need for Speed (2014) - IMDb
Fresh from prison, a street racer who was framed by a wealthy business associate joins a cross-country race with revenge in mind.
Read more >
Need for Speed - Rotten Tomatoes
Tobey Marshall (Aaron Paul), a mechanic, races muscle cars in an underground circuit. Struggling to keep his business afloat, he reluctantly partners with ......
Read more >
Need for Speed Unbound - Official Reveal Trailer (ft. A$AP ...
Start at the bottom and race to the top in Need for Speed Unbound, coming to PlayStation 5, Xbox Series X|S and PC...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found