question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance regression in JSONSerializer.default() in v7.15.0

See original GitHub issue

Elasticsearch version (bin/elasticsearch --version): 7.13.1

elasticsearch-py version (elasticsearch.__versionstr__): 7.15.0

Description of the problem including expected versus actual behavior: We’re a bit puzzled at this, but we’ve narrowed it into being related to us upgrading the Elasticsearch package from 7.14.0 to 7.15.0.

Upon upgrading, we experienced that all of our Elasticsearch calls rose significantly in latency - to the point where is cascaded across all of our systems. It took us a few days to figure out what was going on, but in the end, we simply downgraded the package to 7.14.0, and as seen from the graph here (taken from Elastic APM), it’s quite apparent it had an effect:

Skærmbillede 2021-10-07 kl  14 11 49

Steps to reproduce: Honestly I’m not sure how to describe a reproducible flow. I’ll be very happy to help debugging this against our production environment in any way possible, if someone has ideas on what to look for.

To others who might be experiencing the same issue: We’ve pinned our projects to 7.14.0 for now, as this effectively solves the issue for us.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
sethmlarsoncommented, Oct 8, 2021

I think I’ve figured out the issue, it’s caused by https://github.com/elastic/elasticsearch-py/pull/1716 which moved the attempt to serialize Pandas and Numpy types into JSONSerializer.default() but since it looks like you’re relying on .default() for all keys that’s where the problem lies. Basically it’s attempting to import numpy and pandas and failing per key which is quite a bit of overhead.

Based on this and a guess for what your keys look like (assuming either Promises or str) I wonder if changing your .default() implementation would fix the issue for you immediately:

    def default(self, data):
        if isinstance(data, Promise):
            return force_str(data)
        if isinstance(data, str):
            return data
        return super().default(data)

Either way I’ll fix this issue and it’ll go out in a patch release of 7.15.

1reaction
HenrikOssipoffcommented, Oct 8, 2021

@sethmlarson Thanks for getting back to me so quick! Sorry for the late reply.

I’m not entirely sure how the HTTP Client logic works; we don’t explicitly define one ourselves, so I assume it selects one based on installed packages? I’ve attached a pip freeze - the project doesn’t use async:

amqp==5.0.6
anyio==3.3.2
argon2-cffi==21.1.0
asgiref==3.3.4
awesome-slugify==1.6.5
billiard==3.6.4.0
cachetools==4.2.2
celery==5.1.2
certifi==2020.12.5
cffi==1.14.6
chardet==4.0.0
charset-normalizer==2.0.6
click==7.1.2
click-didyoumean==0.0.3
click-plugins==1.1.1
click-repl==0.1.6
cool==3.1.24
coolshop-search-dsl==2.2.1
Django==3.2.8
django-healthz==0.0.5
django-redis==5.0.0
django-storages==1.11.1
djangorestframework==3.12.4
elastic-apm==6.5.0
elasticsearch==7.15.0
elasticsearch-dsl==7.4.0
google-api-core==2.1.0
google-auth==2.3.0
google-cloud-core==2.1.0
google-cloud-storage==1.42.3
google-crc32c==1.3.0
google-resumable-media==2.0.3
googleapis-common-protos==1.53.0
gunicorn==20.1.0
h11==0.12.0
h2==4.1.0
hpack==4.0.0
httpcore==0.13.7
httpx==0.19.0
hyperframe==6.0.1
idna==2.10
kombu==5.1.0
ldap3==2.5.2
limits==1.5.1
lxml==4.6.3
networkx==2.6.3
prompt-toolkit==3.0.18
protobuf==3.18.1
psycopg2==2.9.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pylogbeat==2.0.0
pyodbc==4.0.32
pyparsing==2.4.7
python-dateutil==2.8.1
python-logstash-async==2.3.0
python-memcached==1.59
pytz==2021.3
redis==3.5.3
regex==2021.4.4
requests==2.25.1
rfc3986==1.5.0
rsa==4.7.2
sentry-sdk==1.4.3
six==1.15.0
sniffio==1.2.0
sqlparse==0.4.1
suds-jurko==0.6
Unidecode==0.4.21
urllib3==1.26.4
vine==5.0.0
wcwidth==0.2.5

There’s no deprecation warnings as far as I can see, although we do get this warning: ElasticsearchWarning: The client is unable to verify that the server is Elasticsearch due security privileges on the server side

We get that on 7.14.0 as well, though.

Regarding the API’s, it’s a bunch of somewhat complex search() calls. They contain a bunch of aggregations.

One thing I’d like to mention, is that we use a custom JSONSerializer to account for some weridness on our end - I’ve included it here just in case:

from django.utils.encoding import force_str
from django.utils.functional import Promise
from elasticsearch.serializer import JSONSerializer


class CoolSearchJSONSerializer(JSONSerializer):
    def default(self, data):
        if isinstance(data, Promise):
            return force_str(data)
        return super().default(data)

    def force_key_encoding(self, data):
        if isinstance(data, dict):

            def yield_key_value(d):
                for key, value in d.items():
                    try:
                        yield self.default(key), self.force_key_encoding(value)
                    except TypeError:
                        yield key, self.force_key_encoding(value)

            return dict(yield_key_value(data))
        else:
            return data

    def dumps(self, data):
        return super().dumps(self.force_key_encoding(data))

We use it like so:

connections.configure(
    default={
        "hosts": ES_HOSTS,
        "serializer": CoolSearchJSONSerializer(),
        "retry_on_timeout": True,
        "max_retries": 5,
        "http_auth": (os.getenv("ELASTICSEARCH_USER"), os.getenv("ELASTICSEARCH_PASSWORD")),
    }
)

We can do a bisect of the commits relating to 7.15.0 if need be, but I hope you’re able to see something I can’t instead. We’d have to put the code into our production system to have enough traffic to see the results, and it’s only fully visible in the APM after a few hours - so it’d take a while to find the offending commit. Plus the impact of our production systems, of course.

What’s really weird is that we use Elasticsearch across multiple projects, but this particular project is that only one we’ve seen the issue with. It is the only one of the projects with this massive amount of traffic though, which might be the reason.

We’re very puzzled.

Read more comments on GitHub >

github_iconTop Results From Across the Web

System.Text.Json is excruciatingly slow #31493 - GitHub
This is getting weirder: Running S.T.Json under BDN inside Rider shows no performance regression. Take away BDN and the problem occurs.
Read more >
What's new in System.Text.Json in .NET 7
This class lets users extend the default reflection-based resolution with custom modifications or combine it with other resolvers (such as ...
Read more >
.NET Serialization Benchmark 2019 Roundup – Alois Kraus
NET Core 3 Json serializer. I find it strange the the current repo contains no tests for the new Json Serializer of .NET...
Read more >
Avoid performance issues with JsonSerializer by reusing the ...
In this post, I describe why you should cache JsonSerializerOptions when using System.Text.Json.JsonSerializer to get good performance.
Read more >
System.Text.Json will not directly support System.Runtime ...
This is moronic considering both Text.Json and System.Runtime.Serialization are BOTH part of the .net core runtime.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found