ES IndexerBold - Fix behaviour of afterBulk
See original GitHub issueHi @jnioche,
I was looking into https://github.com/DigitalPebble/storm-crawler/pull/989#discussion_r918581042 and reviewed the old code in order to make sure, that I get the wanted behaviour. (see https://github.com/FelixEngl/storm-crawler/blob/834347e53f79376d3a79f125a6203c91d062e04f/external/elasticsearch/src/main/java/com/digitalpebble/stormcrawler/elasticsearch/bolt/IndexerBolt.java)
Now I am wondering, shouldn’t it be enough to only process the first encounter of a BulkResponseElement with a specific id and otherwise just print the required LOG-events and update the counters accordingly?
Because the old code worked like this (if I got that right):
:START afterBulk
:ITERATION 1
+ waitAck ---------------+
| "A" | [tuple1, tuple3] |
| "B" | [tuple2] |
+------------------------+
+ bulk_response ---------------+
| 1. (id: "A", state: SUCCESS) |
| 2. (id: "B", state: SUCCESS) |
| 3. (id: "A", state: FAILURE) |
+------------------------------+
respone = bulk_respose.removeFirst() : (id: "A", state: SUCCESS)
tuples = waitAck.getIfPresent(response.id) : [tuple1, tuple3]
for(tuple in tuples){
// process all tuples as state: SUCCESS
...
}
waitAck.invalidate(response.id) // Immediate removal
:ITERATION 1
:ITERATION 2
+ waitAck -------+
| "B" | [tuple2] |
+----------------+
+ bulk_response ---------------+
| 2. (id: "B", state: SUCCESS) |
| 3. (id: "A", state: FAILURE) |
+------------------------------+
respone = bulk_respose.removeFirst() : (id: "B", state: SUCCESS)
tuples = waitAck.getIfPresent(response.id) : [tuple2]
for(tuple in tuples){
// process all tuples as state: SUCCESS
...
}
waitAck.invalidate(response.id) // Immediate removal
:ITERATION 2
:ITERATION 3
+ waitAck -------+
+----------------+
+ bulk_response ---------------+
| 3. (id: "A", state: FAILURE) |
+------------------------------+
respone = bulk_respose.removeFirst() : (id: "A", state: FAILURE)
tuples = waitAck.getIfPresent(response.id) : null
LOG.warn("could not find unacked tuple for A")
:ITERATION 3
:STOP afterBulk
Best Regards
Felix
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Issues · DigitalPebble/storm-crawler - GitHub
ES IndexerBold - Fix behaviour of afterBulk. #992 opened on Jul 16 by FelixEngl · 6. ConcurrentModificationException thrown by metrics in Fetcher executor ......
Read more >Last issues related to apache-storm - PullAnswer
ES IndexerBold - Fix behaviour of afterBulk · ConcurrentModificationException thrown by metrics in Fetcher executor.
Read more >DigitalPebble storm-crawler Analysis & Statistics - Devscope.io
🗣️ Storm-crawler Issues ; Blocking fetcher thread 3 ; storm ui 2 ; ES IndexerBold - Fix behaviour of afterBulk 6 ; Fix...
Read more >storm-crawler - bytemeta
ES IndexerBold - Fix behaviour of afterBulk. jnioche ... Fix starvation and busy waiting of ES StatusUpdaterBolt. Previous Next.
Read more >[api-docs] required field defaults to true - Discordeno/Discordeno
ES IndexerBold - Fix behaviour of afterBulk, 6, 2022-07-16, 2022-08-10. Pass the current color mode to functions used in `sx`, 1, 2022-01-07, 2022-08-15....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes. To be honest right now i am not happy with the design i used in the ES part.
Its working fine, but looks too “wild” and may become problematic to support in the future.
So right now i am working on a redesign, but first i want to make sure, that I understood the ACK logic.
That wont do it, because the lock would span over too much logic. So I have to rewrite the whole thing anyway.