bulk index connection timeout caused by readtimeout error
See original GitHub issueI’m trying to index using bulk index api in es.But index processing seems to get slower and slower even cause connection timeout.The next is my python code.
def bulk_index():
chunk = []
for count, doc in enumerate(doc_generator()):
id = doc.get('_id')
chunk.append({
'index': {
'_id': id
}
})
chunk.append(doc)
if (count + 1) % 10000 == 0:
assert len(chunk) == 20000
res = es.bulk(index='weibo', doc_type='user', body=chunk)
assert res['errors'] is False
print 'keke'
chunk = []
print 'docs total count: %s' % (count + 1)
Issue Analytics
- State:
- Created 8 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Bulk indexing raise read timeout error - Elasticsearch
You get read timeouts from the server because the client is misbehaving. Cluster power, chunk size, timeout length and API use are not ......
Read more >Elasticsearch Bulk insert w/ Python - socket timeout error
The connection to elasticsearch has a configurable timeout, which by default is 10 seconds. So, if your elasticsearch server takes more than 10 ......
Read more >Connection Timeout vs. Read Timeout for Java Sockets
From the client side, the “read timed out” error happens if the server is taking longer to respond and send information. This could...
Read more >[Elasticsearch] ConnectionTimeout caused by - 솜씨좋은장씨
[Elasticsearch] ConnectionTimeout caused by - ReadTimeoutError (feat. bulk API + python). 솜씨좋은장씨 2020. 5. 8. 15:51.
Read more >Timeouts during indexing: Too many commits! - SearchStax
Frequently, the root problem is too-frequent attempts to “commit” the new documents to the index. Solr supports rapid indexing by temporarily accumulating ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Bulk indexing can be an expensive process so it can indeed take a long time. If it takes more than 10 seconds the only soution is to raise the
timeout
parameter of the es client. Either for the whole instance or you can specifyrequest_timeout=30
as part of thees.bulk
call. There is no way to speed it up from the client - either send smaller bulk requests or raise the timeout value.also note that you are replicating the chunking logic that we already have as part of the bulk helper: http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.bulk
Hope this helps…
@vovavovavovavova as of right now we do not support tuple timeouts in elasticsearch-py. Though if that’s something you feel we should add you can create an issue and I’ll look into it.
Personally I’ve never used tuples as timeouts. But I do see that there’d be some value there on the requests side.
urllib3
does not support this and as I write this I’m not sure how much work it’d be to do to ensure that it works with requests and not with urllib3. But am willing to investigate.