Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance of serializing nested collections is poor

See original GitHub issue

I worked up a quick test using the nose timed decorator.

class TestSerializerTime(unittest.TestCase):

    def setUp(self):
        self.users = []
        self.blogs = []
        letters = list(string.ascii_letters)

        for i in range(500):
            self.users.append(User(''.join(random.sample(letters, 15)),
                email='jiod@fjios.com', age=random.randint(10, 50)))

        for i in range(500):
            self.blogs.append(Blog(''.join(random.sample(letters, 50)),
                user=random.choice(self.users)))

    @timed(.2)
    def test_small_blog_set(self):
        res = BlogSerializer(self.blogs[:20], many=True)

    @timed(.4)
    def test_medium_blog_set(self):
        res = BlogSerializer(self.blogs[:250], many=True)

    @timed(1)
    def test_large_blog_set(self):
        res = BlogSerializer(self.blogs, many=True)

    @timed(.1)
    def test_small_user_set(self):
        res = UserSerializer(self.users[:20], many=True)

    @timed(.2)
    def test_medium_user_set(self):
        res = UserSerializer(self.users[:250], many=True)

    @timed(.5)
    def test_large_user_set(self):
        res = UserSerializer(self.users, many=True)

The user tests all pass, but the medium and large blog tests do not. Obviously, these could pass on some machines, but it’s still rather slow.

I did a little bit more testing with profile. Serializing the whole blog collection was running between 5 and 6s.

It looks like the bottleneck is the deepcopy operation in serializer.py and it doesn’t seem like the call can be removed, or changed to a pickle/unpickle operation.

I’m going to keep digging to see what I can do. If you have any insight, I’d appreciate the help. Thanks!

Issue Analytics

State:
Created 10 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

3reactions

sloriacommented, May 30, 2018

@mgd722 I haven’t compared the two usages in a while, but they should be similar. If you’re serializing ORM objects, I’d first look into your relationship loading technique and make sure you’re not running into the n+1 problem.

1reaction

sloriacommented, Dec 29, 2013

The deepcopy operation is expensive, but necessary, so that serializers can store errors from nested serializers.

I did a little work with cProfile and your code above (the gist of the script is here) and found 2 significant speedups:

Passing an instance–not a class–into a Nested field.

Example:

collaborators = fields.Nested(UserSerializer(), many=True)

instead of

collaborators = fields.Nested(UserSerializer, many=True)

This avoids repeating the initialization code (including the deepcopy) for each collaborator. In the future, it’ll be better to cache the nested serializer object, or disallow passing classes altogether.

Overriding __deepcopy__ method of field objects so they are only shallow copied (9c0f062). Even though the declared_fields dictionary must be deep-copied, field objects themselves don’t need to be deep-copied.

These two modifications decreased the execution time of the above script by almost half.

Thanks for reporting this. I will continue to do more profiling and see where performance can be improved even further.

Top Results From Across the Web

Poor Ruby on Rails performance when using nested :include

This ended up being a problem with the serialization/deserialization of the user model in the entire object graph. By caching relevant data ...

Improve Serialization Performance in Django Rest Framework

Model Serializer Performance. A while back we noticed very poor performance from one of our main API endpoints. The endpoint fetched data from...

How to Increase Performance of serialization? Took me 12s to ...

I'm developing a GeoDjango app which use the provided WorldBorder model in the tutorial. I have extended the WorldBorder so it has a...

JSON serialization performance issues with JsonObject

We're seeing some significant performance issues when serializing a JSON document using the `JsonObject` model when compared to using the ...

Serialization performance in .NET: JSON, BSON, Protobuf, Avro

For each format and its respective library, there are two benchmarks — processing the entire collection in one call to the serialization/deserialization library ......