Delays in partition EOF being sent to additional instances
See original GitHub issueDescription
I have an application that consume large amount of data from a topic, roundly around 1500 messages a second. The topic has 9 partitions.
When my application starts I join my consumer group and consume all messages until I encounter a partition EOF for all assigned partitions or until there are no messages available. This works without any problem. The reason we need to do this is that the messages contain market data used for trading, so we need to see all the updates that have happened since we last started, but also want to constantly receiving the latest messages as soon as they happen.
However, when I start a second instance of the application something strange happens. The partitions are rebalanced, so that typically Inst-1
has partitions 5-8 and the new instance (Inst-2
) gets partitions 0-4. Inst-2
then begins in startup logic of consuming all messages until partition EOF or no more messages are available. The strange thing is that on some of the partitions it takes a long time to encounter a partition EOF (or no messages), sometimes up to 10 minutes! Also, 95% of the time this happens on partition #2.
Bizarrely, if I shut down Inst-1
during this phase then the partitions are all assigned to Inst-2
and it does get the partition EOF messages for each partition is a timely manner.
It feels like this is some sort of consumer group leader issue. However, I’m wondering if my expectations around partition EOF are wrong and that I shouldn’t expect it to behave this way. If so then what is the best way to ensure you consume all messages up to the EOF (or equivilant)?
I’m using version 1.7 of the C# Kafka libraries. We’re not able to use the 1.8 version due to issue around SSL authentication with the library.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
what is the SSL authentication issue?
this may be an issue and may be resolved by changes in v1.9.0-RC10, i’d need to look over the debug logs to work it out. first thing to do is try the latest version.
@mhowlett when you say next release, are you referring to v1.9.3 or the version after that? If there is a road map with this info, I apologize as I didn’t see it (or overlooked it).