question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Elasticsearch IndexerBolt not being acked correctly causing failures in spout

See original GitHub issue

We found with @jcruzmartini that elasticsearch Indexer is bolt acking before emit tuples in afterBulk method is causing ack failures in spout after timeout set in topology.

Proposed solution is change order of emit / ack in com.digitalpebble.stormcrawler.elasticsearch.bolt.IndexerBolt :

if (!failed) {
                    acked++;
                    _collector.emit(StatusStreamName, t, new Values(u,
                            metadata, Status.FETCHED));
                    _collector.ack(t);
                } else {
...
...

After migrate from 1.13 to 1.16 we noticed bad performance in our crawler, and also a lot of failures in the spout, after add IndexerBolt class in our project with that modification it started working correctly with great performance

@jnioche we can create a pull request with a simple change in that class if you want

Thanks! Matias

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jniochecommented, May 29, 2020

changed the title a bit as it is not about the anchoring as such

@jcruzmartini it wasn’t an easy one to spot, but you and @matiascrespof have great detective skills 😉

Thanks again for reporting it and submitting a PR. I’ll go through all the acks to see if this happens anywhere else

0reactions
jniochecommented, May 29, 2020

Fixed by #801

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fix common cluster issues | Elasticsearch Guide [8.5] | Elastic
The most common causes of high CPU usage and their solutions. High JVM memory pressure: High JVM memory usage can degrade cluster performance...
Read more >
Troubleshooting Elasticsearch ILM: Common issues and fixes
If the policy configuration is correct and no errors are reported but your action isn't progressing, you'll need to investigate if it's waiting ......
Read more >
Fix common cluster issues | Elasticsearch Guide [7.17] | Elastic
The following tips outline the most common causes of high CPU usage and their solutions. Scale your cluster. Heavy indexing and search loads...
Read more >
Elasticsearch Resiliency Status | Elastic
This issue exposed a bug in Elasticsearch's handling of primary shard failure when having more than 2 replicas, causing the second replica to...
Read more >
Troubleshooting searches | Elasticsearch Guide [8.5] | Elastic
When getting no search results in Kibana, check that you have selected the correct data view and a valid time range. Also, ensure...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found