question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Idle downstream actors until confirmation is send

See original GitHub issue

Watching the KafkaConsumerActor API I understand that the only way of process N messages batches from the same topic concurrently is creating N KafkaConsumerActor with the same groupId and each one with his own corresponding downstream actor. From KafkaConsumerActor.

Before the actor continues pulling more data from Kafka, the receiver of the data must confirm the batches by sending back a [[KafkaConsumerActor.Confirm]] message that contains the offsets from the received batch.

If we want to achieve at-least-once semantics we must wait until each batch processing succeeds in order to send the Confirm message with the offsets. So, although the downstream actor is not blocked (assuming we’ve done things right processing the batch async, e.g in a Future) it won’t receive more batches until the Confirmation(offsets) is send and I see this being equal to blocking the actor. Suppose the first batch contains messages first message second message third message I writes this messages to a Mongo DB in a Future and send a confirmation message to the KafkaConsumerActor when this Futures succeeds. If new messages comes fourth message fifth message I would like to be able to process this new messages even if the first write to Mongo hasn’t finished. Until Mongo confirm the writes the downstream actor will be in a idle state, so why KafkaConsumerActor couldn’t keep sending new batches? I understand that this works as a mechanism to avoid overwhelming the downstream actor

This mechanism allows the receiver to control the maximum rate of messages it will receive.

But we couldn’t configure a threshold of batches that can be sent without confirmation? Something like split the unconfirmed state in unconfirmedThresholdUnreached and unconfirmedThresholdReached. states

If creating N KafkaConsumerActor + downstream actor (with same groupid) is the way to go, I must choose a fixed number of actors?

I hope I’ve made myself clear. Thanks!

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
simonsoutercommented, Jun 11, 2016

Great discussion. I think the multiple batch idea proposed by @gabrielgiussi would certainly improve the performance of a single stream in the scenario as described. There is a clear latency introduced in awaiting the downstream system’s (mongo) confirmation of the batch before getting the next one.

Providing a capability to process multiple batches concurrently does introduce some complexity however, which is described here: (https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html) in section “Decouple Consumption and Processing”. Specifically it becomes tricky to keep the commit position consistent and ordering guarantees are lost.

It would seem more reasonable to me to go with the original suggestion of using multiple KafkaConsumerActors with the same “groupId” to achieve the performance optimisation (downstream batch parallelism), rather than add the additional configuration and implementation complexity to the KafkaConsumerActor. Since the ordering of the stream could not easily be guaranteed using the multiple batch technique, it makes sense to break up the stream into multiple ones and lean on the group capabilities already provided by the underlying driver.

0reactions
gabrielgiussicommented, Jun 13, 2016

Good point @simonsouter.

Decouple Consumption and Processing

  • CON: Guaranteeing order across the processors requires particular care as the threads will execute independently an earlier chunk of data may actually be processed after a later chunk of data just due to the luck of thread execution timing. For processing that has no ordering requirements this is not a problem.
  • CON: Manually committing the position becomes harder as it requires that all threads co-ordinate to ensure that processing is complete for that partition.

So, the next thing I’ve got to do is choose between use a fixed number of KafkaConsumerActor created at application startup or create KafkaConsumerActors as needed, this requires some mechanism that let me know when my ReceiverActors are overwhelmed (could KafkaConsumerActors constantly entering in bufferFull state act as a signal of that?) I’m thinking in elasticity here, e.g be capable of processing a peak load of Kafka messages buy maybe I’m going to far.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Configuration - Documentation - Akka
The Class of the FQCN must have a public # constructor with # (akka.actor. ... is guaranteed to be sent when the remaining...
Read more >
Normal downstream 02 sensor readings? - Ranger-Forums
I'm searching for the cause of a PO171 code, and find my downstream 02 sensor (just replaced) is reading in the .075-.1v range...
Read more >
O2 Sensor Question - is 700-800 mV at idle normal ...
After coolant has reached 87/88°C, at idle (~700 rpm), O2 sensor voltage is 700-800 millivolts. Normal highway cruising, no incline, ...
Read more >
Documentation - Apache Kafka
Kafka allows producers to wait on acknowledgement so that a write isn't considered complete until it is fully replicated and guaranteed to persist...
Read more >
Fahrenheit 451: Summary & Analysis Part 3 - Cliffs Notes
He phones in a fire alarm and then waits until the blare of the siren is heard ... While he travels downstream, the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found