question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Delays in partition EOF being sent to additional instances

See original GitHub issue

Description

I have an application that consume large amount of data from a topic, roundly around 1500 messages a second. The topic has 9 partitions.

When my application starts I join my consumer group and consume all messages until I encounter a partition EOF for all assigned partitions or until there are no messages available. This works without any problem. The reason we need to do this is that the messages contain market data used for trading, so we need to see all the updates that have happened since we last started, but also want to constantly receiving the latest messages as soon as they happen.

However, when I start a second instance of the application something strange happens. The partitions are rebalanced, so that typically Inst-1 has partitions 5-8 and the new instance (Inst-2) gets partitions 0-4. Inst-2 then begins in startup logic of consuming all messages until partition EOF or no more messages are available. The strange thing is that on some of the partitions it takes a long time to encounter a partition EOF (or no messages), sometimes up to 10 minutes! Also, 95% of the time this happens on partition #2.

Bizarrely, if I shut down Inst-1 during this phase then the partitions are all assigned to Inst-2 and it does get the partition EOF messages for each partition is a timely manner.

It feels like this is some sort of consumer group leader issue. However, I’m wondering if my expectations around partition EOF are wrong and that I shouldn’t expect it to behave this way. If so then what is the best way to ensure you consume all messages up to the EOF (or equivilant)?

I’m using version 1.7 of the C# Kafka libraries. We’re not able to use the 1.8 version due to issue around SSL authentication with the library.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mhowlettcommented, Jun 6, 2022

what is the SSL authentication issue?

this may be an issue and may be resolved by changes in v1.9.0-RC10, i’d need to look over the debug logs to work it out. first thing to do is try the latest version.

0reactions
RedTail72commented, Oct 27, 2022

@mhowlett when you say next release, are you referring to v1.9.3 or the version after that? If there is a road map with this info, I apologize as I didn’t see it (or overlooked it).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kafka consumer that is spun-up and torn down misses ...
The problem I'm running into is that after the virtual machine is spun up and listening on consumer group C2 commences it will...
Read more >
confluent_kafka API — confluent-kafka 2.2.0 documentation
Create additional partitions for the given topics. Parameters. new_partitions (list(NewPartitions)) – New partitions to be created.
Read more >
Iterator API - Karafka framework documentation
The Iterator API is designed to be simple and easy to use. It allows developers to subscribe to specific Kafka topics and partitions...
Read more >
Releases - silverback-messaging.net
Add new Kafka partition EOF callback to be notified when the end of a partition is reached by the consumer (see Kafka Events...
Read more >
Errors and Warnings — Verilator 5.014 documentation
In this mode, Verilator ignores the delays and gives an ASSIGNDLY or STMTDLY warning. If these were suppressed, due to the absence of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found