Performance regression in JSONSerializer.default() in v7.15.0
See original GitHub issueElasticsearch version (bin/elasticsearch --version
): 7.13.1
elasticsearch-py
version (elasticsearch.__versionstr__
): 7.15.0
Description of the problem including expected versus actual behavior: We’re a bit puzzled at this, but we’ve narrowed it into being related to us upgrading the Elasticsearch package from 7.14.0 to 7.15.0.
Upon upgrading, we experienced that all of our Elasticsearch calls rose significantly in latency - to the point where is cascaded across all of our systems. It took us a few days to figure out what was going on, but in the end, we simply downgraded the package to 7.14.0, and as seen from the graph here (taken from Elastic APM), it’s quite apparent it had an effect:
Steps to reproduce: Honestly I’m not sure how to describe a reproducible flow. I’ll be very happy to help debugging this against our production environment in any way possible, if someone has ideas on what to look for.
To others who might be experiencing the same issue: We’ve pinned our projects to 7.14.0 for now, as this effectively solves the issue for us.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
I think I’ve figured out the issue, it’s caused by https://github.com/elastic/elasticsearch-py/pull/1716 which moved the attempt to serialize Pandas and Numpy types into
JSONSerializer.default()
but since it looks like you’re relying on.default()
for all keys that’s where the problem lies. Basically it’s attempting to importnumpy
andpandas
and failing per key which is quite a bit of overhead.Based on this and a guess for what your keys look like (assuming either Promises or
str
) I wonder if changing your.default()
implementation would fix the issue for you immediately:Either way I’ll fix this issue and it’ll go out in a patch release of 7.15.
@sethmlarson Thanks for getting back to me so quick! Sorry for the late reply.
I’m not entirely sure how the HTTP Client logic works; we don’t explicitly define one ourselves, so I assume it selects one based on installed packages? I’ve attached a
pip freeze
- the project doesn’t use async:There’s no deprecation warnings as far as I can see, although we do get this warning:
ElasticsearchWarning: The client is unable to verify that the server is Elasticsearch due security privileges on the server side
We get that on 7.14.0 as well, though.
Regarding the API’s, it’s a bunch of somewhat complex
search()
calls. They contain a bunch of aggregations.One thing I’d like to mention, is that we use a custom JSONSerializer to account for some weridness on our end - I’ve included it here just in case:
We use it like so:
We can do a bisect of the commits relating to 7.15.0 if need be, but I hope you’re able to see something I can’t instead. We’d have to put the code into our production system to have enough traffic to see the results, and it’s only fully visible in the APM after a few hours - so it’d take a while to find the offending commit. Plus the impact of our production systems, of course.
What’s really weird is that we use Elasticsearch across multiple projects, but this particular project is that only one we’ve seen the issue with. It is the only one of the projects with this massive amount of traffic though, which might be the reason.
We’re very puzzled.