Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Share updated python class object through multiple workers (gunicorn-fastAPI)

See original GitHub issue

First Check

I added a very descriptive title to this issue.
I used the GitHub search to find a similar issue and didn’t find it.
I searched the FastAPI documentation, with the integrated search.
I already searched in Google “How to X in FastAPI” and didn’t find any information.
I already read and followed all the tutorial in the docs and didn’t find an answer.
I already checked if it is not related to FastAPI but to Pydantic.
I already checked if it is not related to FastAPI but to Swagger UI.
I already checked if it is not related to FastAPI but to ReDoc.

Commit to Help

I commit to help with one of those options 👆

Example Code

from fastapi import FastAPI, Depends
import pandas as pd
import os

class SsCorpusDataframe:
    dataframe: Optional[pd.DataFrame]

    def load_dataframe(self):
        """Loads the dataframe within the pickle file"""
        dataframe_file = os.path.join("./app/artifacts","ListsOfTitles_all_final_embeddings_formatted.pkl")
        dataframe = pd.read_pickle(dataframe_file)
        self.dataframe = dataframe
        if self.dataframe.empty:
            raise RuntimeError("Corpus dataframe is not loaded")

ss_corpus_dataframe = SsCorpusDataframe()
app = FastAPI()

@app.on_event("startup")
async def startup():
    ## Load corpus dataframe
    if os.path.isfile(os.path.join("./app/artifacts","ListsOfTitles_all_final_embeddings_formatted.pkl")):
        ss_corpus_dataframe.load_dataframe()

@app.post("/update_corpus/", status_code=status.HTTP_201_CREATED, response_model=SsCorpusOut)
def ss_update_corpus_embeddings(ss_corpus: SsCorpusBase):
    
    ## Set full path of the corpus embeddings pickle file
    full_path_pickle_file = os.path.join("./app/artifacts","ListsOfTitles_all_final_embeddings_formatted.pkl")
            
    ## Synchronous way
    update_corpus_embeddings(api_sent_embed_url=settings.api_sent_embed_address, full_path_pickle_file=full_path_pickle_file)
    
    ## Load corpus dataframe
    ss_corpus_dataframe.load_dataframe()
    
    update_corpus_out = SsCorpusOut(full_path_pickle_file=full_path_pickle_file, 
                                    tag_pickle_file=ss_corpus.tag_pickle_file,
                                    size_pickle_file=os.path.getsize(full_path_pickle_file),
                                    ctime_pickle_file=os.path.getctime(full_path_pickle_file))

    return update_corpus_out

@app.post("/create_semantic_search/", status_code=status.HTTP_201_CREATED, response_model=SemanticSearchOut)
async def create_semantic_search(*, session: Session = Depends(get_session), user_query: SemanticSearchIn, background_tasks: BackgroundTasks, settings: Settings = Depends(get_settings)):
    
    if hasattr(ss_corpus_dataframe, 'dataframe'):
    
       ...

    else:
        raise HTTPException(status.HTTP_404_NOT_FOUND, detail=f"ss_corpus_dataframe object has no attribute 'dataframe'. Please update the semantic search corpus.")

Description

When running gunicorn fastAPI docker image (multiple workers processes) with no pickle file included in ./app/artifacts at startup, I managed to create the dataframe attibute of ss_corpus_dataframe object calling the /update_corpus/ route.

However, this dataframe seems to be loaded only on one process, since when calling /create_semantic_search/ route sometimes it got attended result and sometimes got HTTPException(status.HTTP_404_NOT_FOUND, detail=f"ss_corpus_dataframe object has no attribute 'dataframe'. Please update the semantic search corpus.") error.

When using gunicorn fastAPI docker image, how is it possible to share a global class object through all processes when updating the pickle file while API is running through the /update_corpus/ route?

I saw this stackoverflow topic (https://stackoverflow.com/questions/65686318/sharing-python-objects-across-multiple-workers) discussing about caching (aiocache) but don’t really understand how to apply it to my case.

Operating System

Linux

Operating System Details

Ubuntu 18.04 LTS

FastAPI Version

0.70.0

Python Version

3.8.8

Additional Context

No response

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

ugtarcommented, Dec 29, 2021

The first answer in the stackoverflow topic you linked pretty much sums it up. You can’t easily share a python object between multiple gunicorn worker processes. What you want to do is store the data in an external location (server) that is accessible from any of the workers. In the example in the stackoverflow question the author describes using aiocache with a redis backend, which would work nicely, or you could simply just include something like redis and use it directly from python.

To use that you would need to extend the docker image to include redis and friends. There is a section of the documentation which describes how to extend the base fastapi docker image you’re using https://fastapi.tiangolo.com/deployment/docker/#create-a-dockerfile but you should read the whole page is that is not clear to you. After that it would just be a matter of doing something like adding the connection info to the redis server in the app startup function. Then any of your worker processes can query the local redis server for the data.

0reactions

Matthieu-Tinycoachingcommented, Jan 3, 2022

Understood.

Top Results From Across the Web

Sharing python objects across multiple workers - Stack Overflow

Below I will present two very simple examples of how one could use both approaches to share data in FastAPI application between workers....

Sharing data across workers in a Gunicorn + Flask application

The simplest way is to create use the Value or Array shared memory objects from the multiprocessing package. These data types can store...

Multiprocessing Manager to Share an Object with Processes

We must define a custom manager in order to use the manager to create and manage ad hoc Python objects. This involves defining...

Python - Object Oriented - Tutorialspoint

The attributes are data members (class variables and instance variables) and methods, accessed via dot notation. Class variable − A variable that is...

apache_beam.utils.shared — Apache Beam documentation

Shared class for managing a single instance of an object shared by ... large list across all threads of each worker in a...