[QUESTION] Validation in the FastAPI response handler is a lot heavier than expected
See original GitHub issueFirst check
- I used the GitHub search to find a similar issue and didn’t find it.
- I searched the FastAPI documentation, with the integrated search.
- I already searched in Google “How to X in FastAPI” and didn’t find any information.
Description
So I have built a Tortoise ORM to Pydantic adaptor, and it’s about stable, so I started profiling and found some interesting.
Pydantic will validate the data I fetch from the DB, which seems redundant as the DB content is already validated. So we are doing double validation. Further profiling I found that the majority of time is spent by FastAPI preparing the data for serialisation, and then validating it, and then actually serialising (This specific step is what https://github.com/tiangolo/fastapi/issues/1224#issuecomment-617243856 refers to)
So I am doing essentially triple validation…
I then saw that there is the orjson integration, I tried that… and it made no difference that I could tell. (I’ll get to this later)
I did a few experiments (none of them properly tested, but just to get an idea) with a simple benchmark: (The database was populated with 200 junk user profiles generated by hypothesis, response is 45694 bytes)
Key:
R1 → Using FastAPI to serialise a List[User]
model automatically (where User
is a Pydantic model)
R2 → Using FastAPI to serialise a List[User]
model automatically, but disabled the validation step in serialize_response
R3 → Manually serialised the data using an ORJSONResponse
R4 → Using FastAPI to serialise a List[User]
model automatically, bypassed the jsonable_encoder
as I’m serialising with orjson
R5 → Using FastAPI to serialise a List[User]
model automatically, bypassed both validation and jsonable_encoder
C1 → Use provided pydantic from_orm
C2 → Custom constructor that doesn’t validate
My results are:
R1 + C1 → 42req/s
R1 + C2 → 43req/s (Seems the 3 FastAPI steps overpower the validation overhead of from_orm
)
R2 + C1 → 56req/s (Disabling the validation IN FastAPI has a much bigger impact?)
R2 + C2 → 63req/s (So, no extra validation)
R3 + C1 → 75req/s
R3 + C2 → 160req/s
R4 + C1 → 53req/s (This orjson-specific optimization gave us a 26% speedup here!)
R4 + C2 → 64req/s (This orjson-specific optimization gave us a 48% speedup here!)
R5 + C1 → 74req/s (So, almost as fast as bypassing the FastAPI response handler)
R5 + C2 → 147req/s
Was somewhat surprised by these results. Seems that Disabling all validation AND skipping the FastAPI response handler gave me a nearly 4x improvement!!
Outcomes:
- Doing an optimal build from ORM to Pydantic doesn’t help much by itself, but with an optimal response handler, it can fly!
- We should really consider as https://github.com/tiangolo/fastapi/issues/1224#issuecomment-617243856 proposed if using orjson, as it gives a big improvement by itself!
- Validation in the FastAPI response handler is a lot heavier than expected
Questions: How advisable/possible is it to have a way to disable validation in the FastAPI response handler? Is it dangerous to do so? Should it be conditionally bypassed? e.g. we specify a way to mark it as safe?
I’m just trying to get rid of some bottlenecks, and to understand the system.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:22
- Comments:31 (6 by maintainers)
Top GitHub Comments
Based on discussions here and other issues I’ve decided to skip FastAPI’s response handler along with its validation and serialization.
jsonable_encoder
was created before Pydantic’s own.json()
utility was released (source: https://github.com/tiangolo/fastapi/issues/1107#issuecomment-612963659), andjsonable_encoder
seems to have worse performance. We would much rather use Pydantic’s.json()
utility which also have support for custom json encoders and alternative json libraries such as ujson or orjson.So we use a custom Pydantic BaseModel to leverage orjson and custom encoders:
And we use a custom FastAPI response class to give us flexibility of returning either the Pydantic model itself or an already seralized model (if we want more control over alias, include, exclude etc.). Returning a response class also make the data flow more explicit, easier to understand for outside eyes/new developers.
Just wanted to share my approach, thanks for reading.
Full example:
I do find output validation to be useful in production to ensure that the API format contracts are always respected (which can be difficult to prove otherwise, your data source may end up throwing a null value in a place you didn’t expect), so I don’t know about disabling it automatically, but I would agree that being able to disable it for routes that return very large payloads would be useful performance-wise.
As for skipping jsonable_encoder, I’m 100% with you that it should be skippable if the framework user knows the json response renderer can handle any configuration of datatypes returned by the given route.