Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bulk index connection timeout caused by readtimeout error

See original GitHub issue

I’m trying to index using bulk index api in es.But index processing seems to get slower and slower even cause connection timeout.The next is my python code.

def bulk_index():
    chunk = []
    for count, doc in enumerate(doc_generator()):
        id = doc.get('_id')
        chunk.append({
            'index': {
                '_id': id
            }
        })
        chunk.append(doc)
        if (count + 1) % 10000 == 0:
            assert len(chunk) == 20000
            res = es.bulk(index='weibo', doc_type='user', body=chunk)
            assert res['errors'] is False
            print 'keke'
            chunk = []
    print 'docs total count: %s' % (count + 1)

Issue Analytics

State:
Created 8 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

10reactions

honzakralcommented, May 8, 2015

Bulk indexing can be an expensive process so it can indeed take a long time. If it takes more than 10 seconds the only soution is to raise the timeout parameter of the es client. Either for the whole instance or you can specify request_timeout=30 as part of the es.bulk call. There is no way to speed it up from the client - either send smaller bulk requests or raise the timeout value.

also note that you are replicating the chunking logic that we already have as part of the bulk helper: http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.bulk

Hope this helps…

0reactions

fxdgearcommented, Jul 9, 2018

@vovavovavovavova as of right now we do not support tuple timeouts in elasticsearch-py. Though if that’s something you feel we should add you can create an issue and I’ll look into it.

Personally I’ve never used tuples as timeouts. But I do see that there’d be some value there on the requests side. urllib3 does not support this and as I write this I’m not sure how much work it’d be to do to ensure that it works with requests and not with urllib3. But am willing to investigate.

Top Results From Across the Web

Bulk indexing raise read timeout error - Elasticsearch

You get read timeouts from the server because the client is misbehaving. Cluster power, chunk size, timeout length and API use are not ......

Elasticsearch Bulk insert w/ Python - socket timeout error

The connection to elasticsearch has a configurable timeout, which by default is 10 seconds. So, if your elasticsearch server takes more than 10 ......

Connection Timeout vs. Read Timeout for Java Sockets

From the client side, the “read timed out” error happens if the server is taking longer to respond and send information. This could...

[Elasticsearch] ConnectionTimeout caused by - 솜씨좋은장씨

[Elasticsearch] ConnectionTimeout caused by - ReadTimeoutError (feat. bulk API + python). 솜씨좋은장씨 2020. 5. 8. 15:51.

Timeouts during indexing: Too many commits! - SearchStax

Frequently, the root problem is too-frequent attempts to “commit” the new documents to the index. Solr supports rapid indexing by temporarily accumulating ...