question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance of serializing nested collections is poor

See original GitHub issue

I worked up a quick test using the nose timed decorator.

class TestSerializerTime(unittest.TestCase):

    def setUp(self):
        self.users = []
        self.blogs = []
        letters = list(string.ascii_letters)

        for i in range(500):
            self.users.append(User(''.join(random.sample(letters, 15)),
                email='jiod@fjios.com', age=random.randint(10, 50)))

        for i in range(500):
            self.blogs.append(Blog(''.join(random.sample(letters, 50)),
                user=random.choice(self.users)))

    @timed(.2)
    def test_small_blog_set(self):
        res = BlogSerializer(self.blogs[:20], many=True)

    @timed(.4)
    def test_medium_blog_set(self):
        res = BlogSerializer(self.blogs[:250], many=True)

    @timed(1)
    def test_large_blog_set(self):
        res = BlogSerializer(self.blogs, many=True)

    @timed(.1)
    def test_small_user_set(self):
        res = UserSerializer(self.users[:20], many=True)

    @timed(.2)
    def test_medium_user_set(self):
        res = UserSerializer(self.users[:250], many=True)

    @timed(.5)
    def test_large_user_set(self):
        res = UserSerializer(self.users, many=True)

The user tests all pass, but the medium and large blog tests do not. Obviously, these could pass on some machines, but it’s still rather slow.

I did a little bit more testing with profile. Serializing the whole blog collection was running between 5 and 6s.

It looks like the bottleneck is the deepcopy operation in serializer.py and it doesn’t seem like the call can be removed, or changed to a pickle/unpickle operation.

I’m going to keep digging to see what I can do. If you have any insight, I’d appreciate the help. Thanks!

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
sloriacommented, May 30, 2018

@mgd722 I haven’t compared the two usages in a while, but they should be similar. If you’re serializing ORM objects, I’d first look into your relationship loading technique and make sure you’re not running into the n+1 problem.

1reaction
sloriacommented, Dec 29, 2013

The deepcopy operation is expensive, but necessary, so that serializers can store errors from nested serializers.

I did a little work with cProfile and your code above (the gist of the script is here) and found 2 significant speedups:

  • Passing an instance–not a class–into a Nested field.

Example:

collaborators = fields.Nested(UserSerializer(), many=True)

instead of

collaborators = fields.Nested(UserSerializer, many=True)

This avoids repeating the initialization code (including the deepcopy) for each collaborator. In the future, it’ll be better to cache the nested serializer object, or disallow passing classes altogether.

  • Overriding __deepcopy__ method of field objects so they are only shallow copied (9c0f062). Even though the declared_fields dictionary must be deep-copied, field objects themselves don’t need to be deep-copied.

These two modifications decreased the execution time of the above script by almost half.

Thanks for reporting this. I will continue to do more profiling and see where performance can be improved even further.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Poor Ruby on Rails performance when using nested :include
This ended up being a problem with the serialization/deserialization of the user model in the entire object graph. By caching relevant data ...
Read more >
Improve Serialization Performance in Django Rest Framework
Model Serializer Performance. A while back we noticed very poor performance from one of our main API endpoints. The endpoint fetched data from...
Read more >
How to Increase Performance of serialization? Took me 12s to ...
I'm developing a GeoDjango app which use the provided WorldBorder model in the tutorial. I have extended the WorldBorder so it has a...
Read more >
JSON serialization performance issues with JsonObject
We're seeing some significant performance issues when serializing a JSON document using the `JsonObject` model when compared to using the ...
Read more >
Serialization performance in .NET: JSON, BSON, Protobuf, Avro
For each format and its respective library, there are two benchmarks — processing the entire collection in one call to the serialization/deserialization library ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found