question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Users should be able to set the timeout and the chunk_size for elastic search bulk requests.

See original GitHub issue

While dumping huge data to elasticsearch, the mongo-connector can’t work normally because it often crashes due to connection timeout. The default timeout is 10, and can’t be changed. It should be an option of the mongo-connector command so that user can change it when necessary.

2015-10-20 01:48:04,992 [CRITICAL] mongo_connector.oplog_manager:543 - Exception during collection dump
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/mongo_connector/oplog_manager.py", line 495, in do_dump
    upsert_all(dm)
  File "/usr/lib/python2.6/site-packages/mongo_connector/oplog_manager.py", line 479, in upsert_all
    dm.bulk_upsert(docs_to_dump(namespace), mapped_ns, long_ts)
  File "/usr/lib/python2.6/site-packages/mongo_connector/util.py", line 32, in wrapped
    return f(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/mongo_connector/doc_managers/elastic_doc_manager.py", line 190, in bulk_upsert
    for ok, resp in responses:
  File "/usr/lib/python2.6/site-packages/elasticsearch/helpers/__init__.py", line 138, in streaming_bulk
    raise e
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'172.31.1.254', port=9200): Read timed out. (read timeout=10))

Thanks,

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
Jinshuicommented, Oct 20, 2015

I made the following change (line 4) in elastic_doc_manager to increase the time out to 60, it solves my issue:

1           kw = {}
2           if self.chunk_size > 0:
3               kw['chunk_size'] = self.chunk_size
4+          kw['request_timeout'] = 60
5           responses = streaming_bulk(client=self.elastic,
6                                       actions=docs_to_upsert(),
7                                       **kw)

I hope the request_timeout can be a param of the mongo-connector command, or at least set the default time out to be a larger number, 10 seconds is too short.

0reactions
Jinshuicommented, Oct 21, 2015

Please close this issue, both of the timeout and chunk_size can be configured by the configuration file as follows:

{
  "docManagers": [
    {
      "docManager": "elastic_doc_manager",
      "targetURL": "localhost:9200",
      "bulkSize": 200,
      "autoCommitInterval": 0,
      "args": {
        "clientOptions": {"timeout": 60}
      }
    }
  ]
}

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bulk API | Elasticsearch Guide [8.5]
The request body contains a newline-delimited list of create , delete , index , and update actions and their associated source data. create....
Read more >
Elasticsearch Bulk insert w/ Python - socket timeout error
When creating your Elasticsearch object, you specified chunk_size=10000 . This means that the streaming_bulk call will try to insert chunks ...
Read more >
Adding timeout to Bulk API??? #867 - elastic/elasticsearch-js
If you want to specify a request timeout instead, you should use the requestTimeout option. await client.bulk({ body: [.
Read more >
Hibernate Search 6.1.7.Final: Reference Documentation
Allows indexing of ORM entities on multiple application nodes, storing the index on a remote Elasticsearch or OpenSearch cluster (to ...
Read more >
Using Asyncio with Elasticsearch
Async variants of all helpers are available in elasticsearch.helpers and are ... timeout – Time each individual bulk request should wait for shards...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found