question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bulk index connection timeout caused by readtimeout error

See original GitHub issue

I’m trying to index using bulk index api in es.But index processing seems to get slower and slower even cause connection timeout.The next is my python code.

def bulk_index():
    chunk = []
    for count, doc in enumerate(doc_generator()):
        id = doc.get('_id')
        chunk.append({
            'index': {
                '_id': id
            }
        })
        chunk.append(doc)
        if (count + 1) % 10000 == 0:
            assert len(chunk) == 20000
            res = es.bulk(index='weibo', doc_type='user', body=chunk)
            assert res['errors'] is False
            print 'keke'
            chunk = []
    print 'docs total count: %s' % (count + 1)

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

10reactions
honzakralcommented, May 8, 2015

Bulk indexing can be an expensive process so it can indeed take a long time. If it takes more than 10 seconds the only soution is to raise the timeout parameter of the es client. Either for the whole instance or you can specify request_timeout=30 as part of the es.bulk call. There is no way to speed it up from the client - either send smaller bulk requests or raise the timeout value.

also note that you are replicating the chunking logic that we already have as part of the bulk helper: http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.bulk

Hope this helps…

0reactions
fxdgearcommented, Jul 9, 2018

@vovavovavovavova as of right now we do not support tuple timeouts in elasticsearch-py. Though if that’s something you feel we should add you can create an issue and I’ll look into it.

Personally I’ve never used tuples as timeouts. But I do see that there’d be some value there on the requests side. urllib3 does not support this and as I write this I’m not sure how much work it’d be to do to ensure that it works with requests and not with urllib3. But am willing to investigate.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bulk indexing raise read timeout error - Elasticsearch
You get read timeouts from the server because the client is misbehaving. Cluster power, chunk size, timeout length and API use are not ......
Read more >
Elasticsearch Bulk insert w/ Python - socket timeout error
The connection to elasticsearch has a configurable timeout, which by default is 10 seconds. So, if your elasticsearch server takes more than 10 ......
Read more >
Connection Timeout vs. Read Timeout for Java Sockets
From the client side, the “read timed out” error happens if the server is taking longer to respond and send information. This could...
Read more >
[Elasticsearch] ConnectionTimeout caused by - 솜씨좋은장씨
[Elasticsearch] ConnectionTimeout caused by - ReadTimeoutError (feat. bulk API + python). 솜씨좋은장씨 2020. 5. 8. 15:51.
Read more >
Timeouts during indexing: Too many commits! - SearchStax
Frequently, the root problem is too-frequent attempts to “commit” the new documents to the index. Solr supports rapid indexing by temporarily accumulating ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found