Share updated python class object through multiple workers (gunicorn-fastAPI)
See original GitHub issueFirst Check
- I added a very descriptive title to this issue.
- I used the GitHub search to find a similar issue and didn’t find it.
- I searched the FastAPI documentation, with the integrated search.
- I already searched in Google “How to X in FastAPI” and didn’t find any information.
- I already read and followed all the tutorial in the docs and didn’t find an answer.
- I already checked if it is not related to FastAPI but to Pydantic.
- I already checked if it is not related to FastAPI but to Swagger UI.
- I already checked if it is not related to FastAPI but to ReDoc.
Commit to Help
- I commit to help with one of those options 👆
Example Code
from fastapi import FastAPI, Depends
import pandas as pd
import os
class SsCorpusDataframe:
dataframe: Optional[pd.DataFrame]
def load_dataframe(self):
"""Loads the dataframe within the pickle file"""
dataframe_file = os.path.join("./app/artifacts","ListsOfTitles_all_final_embeddings_formatted.pkl")
dataframe = pd.read_pickle(dataframe_file)
self.dataframe = dataframe
if self.dataframe.empty:
raise RuntimeError("Corpus dataframe is not loaded")
ss_corpus_dataframe = SsCorpusDataframe()
app = FastAPI()
@app.on_event("startup")
async def startup():
## Load corpus dataframe
if os.path.isfile(os.path.join("./app/artifacts","ListsOfTitles_all_final_embeddings_formatted.pkl")):
ss_corpus_dataframe.load_dataframe()
@app.post("/update_corpus/", status_code=status.HTTP_201_CREATED, response_model=SsCorpusOut)
def ss_update_corpus_embeddings(ss_corpus: SsCorpusBase):
## Set full path of the corpus embeddings pickle file
full_path_pickle_file = os.path.join("./app/artifacts","ListsOfTitles_all_final_embeddings_formatted.pkl")
## Synchronous way
update_corpus_embeddings(api_sent_embed_url=settings.api_sent_embed_address, full_path_pickle_file=full_path_pickle_file)
## Load corpus dataframe
ss_corpus_dataframe.load_dataframe()
update_corpus_out = SsCorpusOut(full_path_pickle_file=full_path_pickle_file,
tag_pickle_file=ss_corpus.tag_pickle_file,
size_pickle_file=os.path.getsize(full_path_pickle_file),
ctime_pickle_file=os.path.getctime(full_path_pickle_file))
return update_corpus_out
@app.post("/create_semantic_search/", status_code=status.HTTP_201_CREATED, response_model=SemanticSearchOut)
async def create_semantic_search(*, session: Session = Depends(get_session), user_query: SemanticSearchIn, background_tasks: BackgroundTasks, settings: Settings = Depends(get_settings)):
if hasattr(ss_corpus_dataframe, 'dataframe'):
...
else:
raise HTTPException(status.HTTP_404_NOT_FOUND, detail=f"ss_corpus_dataframe object has no attribute 'dataframe'. Please update the semantic search corpus.")
Description
When running gunicorn fastAPI docker image (multiple workers processes) with no pickle file included in ./app/artifacts
at startup, I managed to create the dataframe
attibute of ss_corpus_dataframe
object calling the /update_corpus/
route.
However, this dataframe seems to be loaded only on one process, since when calling /create_semantic_search/
route sometimes it got attended result and sometimes got HTTPException(status.HTTP_404_NOT_FOUND, detail=f"ss_corpus_dataframe object has no attribute 'dataframe'. Please update the semantic search corpus.")
error.
When using gunicorn fastAPI docker image, how is it possible to share a global class object through all processes when updating the pickle file while API is running through the /update_corpus/
route?
I saw this stackoverflow topic (https://stackoverflow.com/questions/65686318/sharing-python-objects-across-multiple-workers) discussing about caching (aiocache) but don’t really understand how to apply it to my case.
Operating System
Linux
Operating System Details
Ubuntu 18.04 LTS
FastAPI Version
0.70.0
Python Version
3.8.8
Additional Context
No response
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
The first answer in the stackoverflow topic you linked pretty much sums it up. You can’t easily share a python object between multiple gunicorn worker processes. What you want to do is store the data in an external location (server) that is accessible from any of the workers. In the example in the stackoverflow question the author describes using aiocache with a redis backend, which would work nicely, or you could simply just include something like redis and use it directly from python.
To use that you would need to extend the docker image to include redis and friends. There is a section of the documentation which describes how to extend the base fastapi docker image you’re using https://fastapi.tiangolo.com/deployment/docker/#create-a-dockerfile but you should read the whole page is that is not clear to you. After that it would just be a matter of doing something like adding the connection info to the redis server in the app startup function. Then any of your worker processes can query the local redis server for the data.
Understood.