Responses seems to be shared and jumbled between different requests
See original GitHub issueWe’ve started to run into some oddities as we get higher and higher load on our application - we’re a bit unsure if it relates to Elasticsearch itself or elasticsearch-py, but we have a hunch it’s the Python client.
I’ll start by saying I unfortunately doesn’t have any unit tests that can reproduce the issue.
Bit of preface: We use the library in two rather large Django applications. What we do is to define the connection directly in our settings file (I don’t know if this is good practice or not, but it seemed fitting). The applications doesn’t run with any concurrency uWSGI wise (different processes), however, we do run a lot of Celery tasks on top.
However, we’ve started to see that the library simply shuffles responses from Elasticsearch around. Once example is this:
product = es.get(index='pim', doc_type='products', id='AX62KY')
It can hardly go wrong, but we experience that we perhaps in one out of 10.000 receive something else. In one case, something that relates to a search (Python dictionary):
{'_shards': {'failed': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [], 'max_score': None, 'total': 0}, 'timed_out': False, 'took': 44}
Clearly, not something you would ever expect from a GET.
What is a bit different, but somehow similiar, we’ve also experienced this search query (JSON this time):
{"size": 15, "from": 0, "sort": [{"properties.sales_rank.da": "asc"}], "query": {"filtered": {"filter": {"and": {"filters": []}}, "query": {"bool": {"must": [{"multi_match": {"type": "phrase_prefix", "fields": ["document.title.en", "document.platform.en", "document.section.en", "document.type.en", "document.platform._meta.abbreviation.en", "_id", "document.sku.en"], "query": "144837"}}]}}}}}
This is being sent in a search, but the return is a scroll ID plus results to that scan/scroll. The result of this ends up being a serialization error.
While I know this is very vague, I wanted to share the info in case there might actually be something in the library that’s not working as intended. Is it perhaps a result of us defining and opening the connection directly in our settings module? What is best practice on that?
Issue Analytics
- State:
- Created 9 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
@alexgarel I spoke with Honza at an event about a year or so ago.
We solved our issue by using the get_connection() feature of the elasticsearch_dsl library. Since switching all our connections to this, we have not had any issues.
We have a similar case here. I’ve seen that under the hood urllib3 is thread safe, using a connectionPool, where a thread borrow a connection for each request, until response. Our configuration is more with multi process celery using elastic for quite long percolation requests, so fork is a good track to follow. We will tell you if we find a way to reproduce.