Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kafka indexing service duplicate entry exception in druid_pendingSegments

See original GitHub issue

After upgrading to Druid 0.16.0-incubating, I am receiving a MySQLIntegrityConstraintViolationException complaining about:

“Duplicate entry XXX for key ‘PRIMARY’ [statement:"INSERT INTO druid_pendingSegments (id, dataSource…”

This results in the Kafka indexing tasks not being able to complete and the eventual failure of the coordinator/overlord nodes. This scenario only seems to happen after I drop some segments from Druid and then push in new data for the time period which was dropped. The only way I have found to fix this has been to force stop all of my Kafka indexing supervisors & tasks and manually delete all of the entires in the druid_pendingSegments table. After I do that, I no longer receive the sql exception and corresponding duplicate entry error message. Any thoughts on this would be greatly appreciated!

How to Reproduce:

Suspend a Kafka indexing supervisor for a given data source and wait for the indexing task(s) to complete.
Drop segments for a certain time period from the given data source and wait for the segments to be unloaded from the historical nodes.
Resume the Kafka indexing supervisor for a given data source.
Push new data through Kafka for the same time period which was previously dropped on the given data source.
Check the indexing logs for the Kafka indexing tasks to see them complaining about duplicate primary key errors.

Other Notables:

The druid_pendingSegments table doesn’t seem to get cleaned up once a Kafka indexing supervisor is suspended. Entries are still left in this table for the given data source despite all the segments having been published to deepstorage / historical nodes. I do have druid.coordinator.kill.pendingSegments.on=true enabled. Maybe this is normal?

Issue Analytics

State:
Created 4 years ago
Comments:21 (11 by maintainers)

Top GitHub Comments

1reaction

bjaenichencommented, Sep 11, 2020

At least in 0.17.0 you can delete entries from that table by using overlord API:

curl -X DELETE -H ‘Accept: application/json, text/plain, */*’ http://[yourhost]:[yourport]/druid/indexer/v1/pendingSegments/[datasource]?interval=1000/3000

0reactions

blachnietcommented, May 17, 2021

I’m seeing this issue in v0.20.2 as well. I’ve got the same flow that @teeram described in the original post.

The coordinator/overload node was unresponsive in this state when we tried to delete pending segments using the API endpoint 1. We resorted to manually deleting rows from the druid_pendingsegments table, like @alphaxo described.