Performance of serializing nested collections is poor
See original GitHub issueI worked up a quick test using the nose timed
decorator.
class TestSerializerTime(unittest.TestCase):
def setUp(self):
self.users = []
self.blogs = []
letters = list(string.ascii_letters)
for i in range(500):
self.users.append(User(''.join(random.sample(letters, 15)),
email='jiod@fjios.com', age=random.randint(10, 50)))
for i in range(500):
self.blogs.append(Blog(''.join(random.sample(letters, 50)),
user=random.choice(self.users)))
@timed(.2)
def test_small_blog_set(self):
res = BlogSerializer(self.blogs[:20], many=True)
@timed(.4)
def test_medium_blog_set(self):
res = BlogSerializer(self.blogs[:250], many=True)
@timed(1)
def test_large_blog_set(self):
res = BlogSerializer(self.blogs, many=True)
@timed(.1)
def test_small_user_set(self):
res = UserSerializer(self.users[:20], many=True)
@timed(.2)
def test_medium_user_set(self):
res = UserSerializer(self.users[:250], many=True)
@timed(.5)
def test_large_user_set(self):
res = UserSerializer(self.users, many=True)
The user tests all pass, but the medium and large blog tests do not. Obviously, these could pass on some machines, but it’s still rather slow.
I did a little bit more testing with profile
. Serializing the whole blog collection was running between 5 and 6s.
It looks like the bottleneck is the deepcopy
operation in serializer.py and it doesn’t seem like the call can be removed, or changed to a pickle/unpickle operation.
I’m going to keep digging to see what I can do. If you have any insight, I’d appreciate the help. Thanks!
Issue Analytics
- State:
- Created 10 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Poor Ruby on Rails performance when using nested :include
This ended up being a problem with the serialization/deserialization of the user model in the entire object graph. By caching relevant data ...
Read more >Improve Serialization Performance in Django Rest Framework
Model Serializer Performance. A while back we noticed very poor performance from one of our main API endpoints. The endpoint fetched data from...
Read more >How to Increase Performance of serialization? Took me 12s to ...
I'm developing a GeoDjango app which use the provided WorldBorder model in the tutorial. I have extended the WorldBorder so it has a...
Read more >JSON serialization performance issues with JsonObject
We're seeing some significant performance issues when serializing a JSON document using the `JsonObject` model when compared to using the ...
Read more >Serialization performance in .NET: JSON, BSON, Protobuf, Avro
For each format and its respective library, there are two benchmarks — processing the entire collection in one call to the serialization/deserialization library ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@mgd722 I haven’t compared the two usages in a while, but they should be similar. If you’re serializing ORM objects, I’d first look into your relationship loading technique and make sure you’re not running into the n+1 problem.
The deepcopy operation is expensive, but necessary, so that serializers can store errors from nested serializers.
I did a little work with cProfile and your code above (the gist of the script is here) and found 2 significant speedups:
Example:
instead of
This avoids repeating the initialization code (including the deepcopy) for each collaborator. In the future, it’ll be better to cache the nested serializer object, or disallow passing classes altogether.
__deepcopy__
method of field objects so they are only shallow copied (9c0f062). Even though the declared_fields dictionary must be deep-copied, field objects themselves don’t need to be deep-copied.These two modifications decreased the execution time of the above script by almost half.
Thanks for reporting this. I will continue to do more profiling and see where performance can be improved even further.