Investigate methods of making R builds faster
See original GitHub issueI recently spoke with @karthik who mentioned that our R builds (with install_packages
) seems to be going really slowly. There could be a couple problems which I’ll list here:
It’s possible to install some R packages in Ubuntu much faster by installing binaries. We could recommend this in the documentation for specifying R packages and such…
relevant blog post: http://dirk.eddelbuettel.com/blog/2017/12/13/
old points:
1. mybinder.org may not have enough RAM which is causing the build to be really slow for certain packages (like the tidyverse). Apparently many R packages have intermediate steps during install that use multiple gigs of RAM.
2. We aren’t using some binary packages even though they are available. repo2docker seems to be building everything from source, even though for some packages there are binaries out there. We could investigate to see if this is an option!
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:40 (18 by maintainers)
Top GitHub Comments
I just tried the @cboettig example (https://mybinder.org/v2/gh/cboettig/r/master?urlpath=rstudio), but it looks like the
sf
package is still not installed, maybe becausesf
depends on quite some spatial libraries (e.g. GDAL, GEOS,…)?Nevertheless, having never used docker, I found it very easy to copy this small Dockerfile (linking to rocker/binder) to my repository: https://github.com/rocker-org/binder/blob/master/binder/Dockerfile. Now
sf
and all other packages work like a charm. I’m impressed by Binder, thank you!@choldgraf “typical” will vary widely of course, but it’s realistic or even small for large spatial analysis.
One strategy would be to improve the binary support: Does the
apt.txt
or whatever it is support users adding PPAs? Otherwise, just adding the https://launchpad.net/~marutter/+archive/ubuntu/c2d4u PPA to the base image seems like a good start so that most packages can be installed from binary. https://launchpad.net/~ubuntugis/+archive/ubuntu/ppa is another popular PPA for folks doing any spatial data. (this is the route we have taken with the r-apt stack in rocker, https://github.com/rocker-org/rocker/tree/master/r-apt)Other option is to pre-install more common things on the base image (though might need more documentation to avoid having pre-installed packages just get re-installed. If users write a DESCRIPTION file for dependencies and use
devtools::install()
ininstall.r
this isn’t an issue, but if they write direct calls toinstall.packages()
, the default beahvior will re-install packages explicitly requested). Of course pre-building means identifying such a ‘common stack’ and then doing more maintenance on the binder end. (as you know, this is the route we’ve taken with the ‘versioned’ stack in rocker)Maybe I’m just out of step with the general thinking here, but really so long as builds are cached, a one-time 1 hr wait doesn’t seem so bad to me. I tickle the build the first time I put binder up, and check back later and it’s built. If the repo is getting much traffic at all, there’s almost always a cached image there. Really, I think your current system works remarkably well, and if it ain’t broke… but yeah, maybe I’m in the minority on that