Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] reduce docker image size

See original GitHub issue

Willingness to contribute

Yes. I would be willing to contribute this feature with guidance from the MLflow community.

Proposal Summary

The command mlflow models build-docker -m "runs:/$(RUN_ID)/sklearn-model/" -n "my-image-name" --env-manager virtualenv generate an docker image.

The size is about 3 GB. It’s a pain point to use these big images in the cloud at large scale. With the env-manager conda, the size is 3.4 GB. The difference between the two env-manager is not so huge. We should expect a very smaller size with the virtualenv.

Motivation

What is the use case for this feature?

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

It’s more easy to convince that mlflow could be a solution to serve a model if the docker image size is small.

Why is it currently difficult to achieve this use case?

Even without conda, the image size is still big.

Details

Optimize the temporary dockerfile.

Some options:

reduce the number of layers: aggregate the RUN commands
remove temporary files rm -rf /var/lib/apt/lists/* after the last install & update
use a more compact initial docker image ?
is java useful to serve a model ?
apt-get clean ?
use a stating image with the adequate python version already installed ?

Current docker steps:

Step 1/28 : FROM ubuntu:18.04
 ---> c6ad7e71ba7d
Step 2/28 : RUN apt-get -y update
 ---> Using cache
 ---> 465ef70c5320
Step 3/28 : RUN apt-get install -y --no-install-recommends          wget          curl          nginx          ca-certificates          bzip2          build-essential          cmake          openjdk-8-jdk          git-core          maven     && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> bde17bfdce18
Step 4/28 : RUN apt -y update
 ---> Using cache
 ---> c9e7b69d3116
Step 5/28 : RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata
 ---> Using cache
 ---> 137c0be7b82a
Step 6/28 : RUN apt-get install -y     libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm     libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
 ---> Using cache
 ---> ac75a70a2d1a
Step 7/28 : RUN git clone     --depth 1     --branch $(git ls-remote --tags https://github.com/pyenv/pyenv.git | grep -o -E 'v[1-9]+(\.[1-9]+)+$' | tail -1)     https://github.com/pyenv/pyenv.git /root/.pyenv
 ---> Using cache
 ---> 0d03bcce2efa
Step 8/28 : ENV PYENV_ROOT="/root/.pyenv"
 ---> Using cache
 ---> 92fb9793c77b
Step 9/28 : ENV PATH="$PYENV_ROOT/bin:$PATH"
 ---> Using cache
 ---> 8642a6c16205
Step 10/28 : RUN apt install -y python3.7
 ---> Using cache
 ---> d588276163ce
Step 11/28 : RUN ln -s -f $(which python3.7) /usr/bin/python
 ---> Using cache
 ---> 53ca1893b359
Step 12/28 : RUN wget https://bootstrap.pypa.io/get-pip.py -O /tmp/get-pip.py
 ---> Using cache
 ---> 1fe30bf7f007
Step 13/28 : RUN python /tmp/get-pip.py
 ---> Using cache
 ---> a05fff5fc7a5
Step 14/28 : RUN pip install virtualenv
 ---> Using cache
 ---> 8bbf4438031a
Step 15/28 : ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
 ---> Using cache
 ---> 6df68b7bba6b
Step 16/28 : ENV GUNICORN_CMD_ARGS="--timeout 60 -k gevent"
 ---> Using cache
 ---> 3d3742873a5b
Step 17/28 : WORKDIR /opt/mlflow
 ---> Using cache
 ---> e88d7c3168ae
Step 18/28 : RUN pip install mlflow==1.26.0
 ---> Using cache
 ---> dc429cc74965
Step 19/28 : RUN mvn  --batch-mode dependency:copy -Dartifact=org.mlflow:mlflow-scoring:1.26.0:pom -DoutputDirectory=/opt/java
 ---> Using cache
 ---> 44d0a3a04830
Step 20/28 : RUN mvn  --batch-mode dependency:copy -Dartifact=org.mlflow:mlflow-scoring:1.26.0:jar -DoutputDirectory=/opt/java/jars
 ---> Using cache
 ---> 2435b92cee67
Step 21/28 : RUN cp /opt/java/mlflow-scoring-1.26.0.pom /opt/java/pom.xml
 ---> Using cache
 ---> 0928aa6feb91
Step 22/28 : RUN cd /opt/java && mvn --batch-mode dependency:copy-dependencies -DoutputDirectory=/opt/java/jars
 ---> Using cache
 ---> ad8a2bede9fd
Step 23/28 : COPY model_dir/ /opt/ml/model
 ---> Using cache
 ---> 5e2d64709562
Step 24/28 : RUN python -c                 'from mlflow.models.container import _install_pyfunc_deps;                _install_pyfunc_deps(                    "/opt/ml/model",                     install_mlflow=False,                     enable_mlserver=False,                     env_manager="virtualenv")'
 ---> Using cache
 ---> 8f9d56ff5803
Step 25/28 : ENV MLFLOW_DISABLE_ENV_CREATION="true"
 ---> Using cache
 ---> 2d0523944c25
Step 26/28 : ENV ENABLE_MLSERVER=False
 ---> Using cache
 ---> d061c9805c17
Step 27/28 : RUN chmod o+rwX /opt/mlflow/
 ---> Using cache
 ---> 4ef8e53d018a
Step 28/28 : ENTRYPOINT ["python", "-c", "from mlflow.models import container as C;C._serve('virtualenv')"]
 ---> Using cache
 ---> 47d5dd9a9f94
Successfully built 47d5dd9a9f94
Successfully tagged my-image-name-venv:latest

What component(s) does this bug affect?

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created a year ago
Comments:5 (1 by maintainers)

Top GitHub Comments

3reactions

rafaelvp-dbcommented, Jun 10, 2022

@BenWilson2 @sebastien-genete I’d be willing to help here as well. Multi-stage builds could also be an option - will run couple of tests and report back

0reactions

sebastien-genetecommented, Jul 19, 2022

I am no more time to spend on this topic. Maybe someone else can try to solve this issue.