question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ES IndexerBold - Fix behaviour of afterBulk

See original GitHub issue

Hi @jnioche,

I was looking into https://github.com/DigitalPebble/storm-crawler/pull/989#discussion_r918581042 and reviewed the old code in order to make sure, that I get the wanted behaviour. (see https://github.com/FelixEngl/storm-crawler/blob/834347e53f79376d3a79f125a6203c91d062e04f/external/elasticsearch/src/main/java/com/digitalpebble/stormcrawler/elasticsearch/bolt/IndexerBolt.java)

Now I am wondering, shouldn’t it be enough to only process the first encounter of a BulkResponseElement with a specific id and otherwise just print the required LOG-events and update the counters accordingly?

Because the old code worked like this (if I got that right):

:START afterBulk

:ITERATION 1
+ waitAck ---------------+
| "A" | [tuple1, tuple3] |
| "B" | [tuple2]         |
+------------------------+

+ bulk_response ---------------+
| 1. (id: "A", state: SUCCESS) |
| 2. (id: "B", state: SUCCESS) |
| 3. (id: "A", state: FAILURE) |
+------------------------------+

respone = bulk_respose.removeFirst() : (id: "A", state: SUCCESS)
tuples = waitAck.getIfPresent(response.id) : [tuple1, tuple3]
for(tuple in tuples){
    // process all tuples as state: SUCCESS
    ...
}
waitAck.invalidate(response.id) // Immediate removal
:ITERATION 1

:ITERATION 2
+ waitAck -------+
| "B" | [tuple2] |
+----------------+

+ bulk_response ---------------+
| 2. (id: "B", state: SUCCESS) |
| 3. (id: "A", state: FAILURE) |
+------------------------------+

respone = bulk_respose.removeFirst() : (id: "B", state: SUCCESS)
tuples = waitAck.getIfPresent(response.id) : [tuple2]
for(tuple in tuples){
    // process all tuples as state: SUCCESS
    ...
}
waitAck.invalidate(response.id) // Immediate removal
:ITERATION 2

:ITERATION 3
+ waitAck -------+
+----------------+

+ bulk_response ---------------+
| 3. (id: "A", state: FAILURE) |
+------------------------------+

respone = bulk_respose.removeFirst() : (id: "A", state: FAILURE)
tuples = waitAck.getIfPresent(response.id) : null
LOG.warn("could not find unacked tuple for A")
:ITERATION 3

:STOP afterBulk

Best Regards

Felix

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
FelixEnglcommented, Jul 18, 2022

Hi @FelixEngl I have to admit that some of your recent changes have made the code slightly more complicated than it was. Are you trying to simplify the current code?

Yes. To be honest right now i am not happy with the design i used in the ES part.

Its working fine, but looks too “wild” and may become problematic to support in the future.

So right now i am working on a redesign, but first i want to make sure, that I understood the ACK logic.

0reactions
FelixEnglcommented, Jul 19, 2022

What about reverting to the previous version and simply add the lock logic?

That wont do it, because the lock would span over too much logic. So I have to rewrite the whole thing anyway.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · DigitalPebble/storm-crawler - GitHub
ES IndexerBold - Fix behaviour of afterBulk. #992 opened on Jul 16 by FelixEngl · 6. ConcurrentModificationException thrown by metrics in Fetcher executor ......
Read more >
Last issues related to apache-storm - PullAnswer
ES IndexerBold - Fix behaviour of afterBulk · ConcurrentModificationException thrown by metrics in Fetcher executor.
Read more >
DigitalPebble storm-crawler Analysis & Statistics - Devscope.io
🗣️ Storm-crawler Issues ; Blocking fetcher thread 3 ; storm ui 2 ; ES IndexerBold - Fix behaviour of afterBulk 6 ; Fix...
Read more >
storm-crawler - bytemeta
ES IndexerBold - Fix behaviour of afterBulk. jnioche ... Fix starvation and busy waiting of ES StatusUpdaterBolt. Previous Next.
Read more >
[api-docs] required field defaults to true - Discordeno/Discordeno
ES IndexerBold - Fix behaviour of afterBulk, 6, 2022-07-16, 2022-08-10. Pass the current color mode to functions used in `sx`, 1, 2022-01-07, 2022-08-15....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found