question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Serving ML models with multiple workers linearly adds the RAM's load.

See original GitHub issue

Recently, we deployed a ML model with FastAPI, and encountered an issue.

The code looks like this.

from ocr_pipeline.model.ocr_wrapper import OcrWrapper
ocr_wrapper = OcrWrapper(**config.model_load_params) # loads 1.5 GB PyTorch model

...

@api.post('/')
async def predict(file: UploadFile = File(...)):
       preds = ocr_wrapper.predict(file.file, **config.model_predict_params)
       return json.dumps({"data": preds})

The above written command consumes min. 3GB of RAM.

gunicorn --workers 2 --worker-class=uvicorn.workers.UvicornWorker app.main:api

Is there any way to scale the number of workers without consuming too much RAM?

ENVIRONMENT: Ubuntu 18.04 Python 3.6.9

fastapi==0.61.2 uvicorn==0.12.2 gunicorn==20.0.4 uvloop==0.14.0

@tiangolo

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:4
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
raphaelauvcommented, Nov 27, 2020

This is not a specific fastAPI question (more a gunicorn one) , it’s about sharing memory between process

The solution would be loading the model in ram before the fork of the workers (of gunicorn)

so you need to use –preload

gunicorn --workers 2 --preload --worker-class=uvicorn.workers.UvicornWorker app.main:api

your main.py file inside folder app

def create_app():
    MY_MODEL.load("model_path")
    app = FastAPI()
    app.include_router(my_router)
    return app
api = create_app()

If you have more question about gunicorn or python or fork or copy-on-write or python reference counting or memory leak -> stackoverflow

YOU can very probably CLOSE this issue , thank you 😃

2reactions
cosimocommented, Dec 3, 2020

Just found out that if I change my app methods from:

@app.post("/clusters", response_model=ClusteringResponse)
async def cluster(request: ClusteringRequest, model=Depends(get_model)):
    """Cluster a list of text sentences"""
    ...

to:

@app.post("/clusters", response_model=ClusteringResponse)
def cluster(request: ClusteringRequest, model=Depends(get_model)):
    """Cluster a list of text sentences"""
    ...

removing the async qualifier, the model does indeed work as expected.

@sevakharutyunyan are you able to verify if this works for you?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Serving ML Models in Production with FastAPI and Celery
This post walks through a working example for serving a ML model using Celery and FastAPI. All code can be found in the...
Read more >
Serving ML Models in Production: Common Patterns - Anyscale
Scalability: Horizontally scale across hundreds of processes or machines, while keeping the overhead in single-digit milliseconds. Multi-model ...
Read more >
Serving ML Models — Ray 1.11.1
Serving ML Models ¶. This section should help you: batch requests to optimize performance. serve multiple models by composing deployments. Contents.
Read more >
Best Tools to Do ML Model Serving - neptune.ai
Best Tools to Do ML Model Serving · 1. BentoML · 2. Cortex · 3. TensorFlow Serving · 4. TorchServe · 5. KFServing...
Read more >
Serving Machine Learning models with Google Vertex AI
You simply need to upload your model, deploy it to an endpoint, and you're ready to go. We need to define one of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found