Kafka indexing service duplicate entry exception in druid_pendingSegments
See original GitHub issueAfter upgrading to Druid 0.16.0-incubating, I am receiving a MySQLIntegrityConstraintViolationException complaining about:
“Duplicate entry XXX for key ‘PRIMARY’ [statement:"INSERT INTO druid_pendingSegments (id, dataSource…”
This results in the Kafka indexing tasks not being able to complete and the eventual failure of the coordinator/overlord nodes. This scenario only seems to happen after I drop some segments from Druid and then push in new data for the time period which was dropped. The only way I have found to fix this has been to force stop all of my Kafka indexing supervisors & tasks and manually delete all of the entires in the druid_pendingSegments table. After I do that, I no longer receive the sql exception and corresponding duplicate entry error message. Any thoughts on this would be greatly appreciated!
How to Reproduce:
- Suspend a Kafka indexing supervisor for a given data source and wait for the indexing task(s) to complete.
- Drop segments for a certain time period from the given data source and wait for the segments to be unloaded from the historical nodes.
- Resume the Kafka indexing supervisor for a given data source.
- Push new data through Kafka for the same time period which was previously dropped on the given data source.
- Check the indexing logs for the Kafka indexing tasks to see them complaining about duplicate primary key errors.
Other Notables:
- The druid_pendingSegments table doesn’t seem to get cleaned up once a Kafka indexing supervisor is suspended. Entries are still left in this table for the given data source despite all the segments having been published to deepstorage / historical nodes. I do have
druid.coordinator.kill.pendingSegments.on=true
enabled. Maybe this is normal?
Issue Analytics
- State:
- Created 4 years ago
- Comments:21 (11 by maintainers)
At least in 0.17.0 you can delete entries from that table by using overlord API:
curl -X DELETE -H ‘Accept: application/json, text/plain, */*’ http://[yourhost]:[yourport]/druid/indexer/v1/pendingSegments/[datasource]?interval=1000/3000
I’m seeing this issue in v0.20.2 as well. I’ve got the same flow that @teeram described in the original post.
The coordinator/overload node was unresponsive in this state when we tried to delete pending segments using the API endpoint 1. We resorted to manually deleting rows from the
druid_pendingsegments
table, like @alphaxo described.