question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[QUESTION] Merging large JSON fields in response schemas

See original GitHub issue

First check

Description

I am dealing with JSON data in the order of 10 MBs (up to hundred MBs) which I directly get from a Postgres Instance. This data is stored as JSONB on this database. To fetch this large amount of data without parsing it into a dictionary I do the following:

items = db_session.query(models.Table.id, cast(models.Table.data, String)).filter_by(id=id).all()

Since I know this data has been properly validated when it was inserted I just use the construct factory from pydantic:

class Item(BaseModel):
    id: int
    data: Union[A, B, C, D]

built_items = [schemas.Item.construct(id=x[0], data=x[1]) for x in items]

Then, on the endpoint I directly return a response using:

starlette.responses.JSONResponse(content=jsonable_encoder(built_items))

But, I still describe the response_model as List[Item] as I need the documentation for this endpoint.

Using this strategy I am able to achieve really good response times, though the original JSON data is encoded now as a string and not as an object when decoded.

So the clients of the API have to decode the response many times: 1) The request itself 2) Once for each JSON object retrieved

Is there any good practice on how to tackle this problem?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
sm-Fifteencommented, Dec 12, 2019

When fetching data that’s been pre-encoded in JSON, I generally create a custom response class to skip JSON encoding and validation completely. FastAPI will acknowledge response_model in your route for any subclass of JSONResponse, so all you need to do is this:

from starlette.responses import JSONResponse

class RawJSONResponse(JSONResponse):
	def render(self, content: bytes) -> bytes:
		return content

@get("/foo", response_model=MyModel, response_class=RawJSONResponse)
def foo():
    return """{"raw": "json"}"""

The problem I’m seeing here is that you don’t get “raw” json from the DB, you get individual JSON elements that still need to be assembled together. What you could always do is use Postgres’ json_array_agg() function to aggregate a column of JSON elements into a JSON array. Since id and data are also stored separately, you’ll probably also need something like json_build_object('id', id, 'data', data), and then wrap that in json_array_agg.

SELECT json_array_agg(json_build_object('id', id, 'data', data)) FROM mytable WHERE id=5

I don’t know how hard that would be to acheive in ORM mode, though.

1reaction
littlebratcommented, Feb 22, 2020

Sorry for the delay, yes it did!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Merge/Combine Two JSON Field Together with Large ...
To merge or combine two JSON (Variant) fields together when each JSON field contain large number of attributes.
Read more >
Merging multiple JSON Lines files into a single JSON object
Given an arbitrary number of JSON objects with an arbitrary structure, it's easy to create a single JSON squished/merged/fused structure that ...
Read more >
jsonmerge - PyPI
This Python module allows you to merge a series of JSON documents into a single one. This problem often occurs for example when...
Read more >
Understanding JSON Schema
When you start developing large schemas with many nested and repeated sections, check out Structuring a complex schema (page 73).
Read more >
Flattening JSON records using PySpark - Towards Data Science
These JSON records can have multi-level nesting, array-type fields which in turn have their own schema. Additionally, some of these fields ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found