Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[URLFrontier] URLFrontier extension not returning ID preventing Status-ACK making crawling impossible

See original GitHub issue

Hello @jnioche,

we switched our URL and Status handling from a custom bolt to URL-frontier. But I recognized, that the Status-Bold is not acking any tuple. After going into the code, adding some log-events and cleaning up the async-code and improving the state-management, my unit test shows the following log:

12:35:05.187 [Time-limited test] INFO  c.d.s.u.StatusUpdaterBolt - Initialisation of connection to URLFrontier service on localhost:53770
12:35:05.187 [Time-limited test] INFO  c.d.s.u.StatusUpdaterBolt - Allowing up to 100000 message in flight
12:35:05.194 [Time-limited test] ERROR c.d.s.u.PartitionUtil - Unknown partition mode : null - forcing to byHost
12:35:05.194 [Time-limited test] INFO  c.d.s.u.URLPartitioner - Using partition mode : QUEUE_MODE_HOST
12:35:05.263 [Time-limited test] TRACE c.d.s.u.StatusUpdaterBolt - Added to waitAck https://www.url.net/something with ID https://www.url.net/something total 1 - sent to localhost:53770
12:35:05.751 [grpc-default-executor-1] WARN  c.d.s.u.StatusUpdaterBolt - Could not find unacked tuple for blank id ``. (Ack: )
12:35:05.752 [grpc-default-executor-1] TRACE c.d.s.u.StatusUpdaterBolt - Trace for unpacked tuple for blank id: 
12:35:10.787 [Time-limited test] INFO  c.d.s.u.ChannelManager - Shutting down channel ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=1, target=localhost:53770}}

It looks like URL-Frontier does not provide an ID when responding to a put, this is an error that can not be fixed on the SC side. Without the ID the Status won’t be able to ACK a single tuple, making crawling basically impossible.

I added the used Unit-Tests ect. in this PR: https://github.com/DigitalPebble/storm-crawler/pull/980

Best Regards

Felix