Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slow manual commits

See original GitHub issue

Description

We need to expose manual sync commits to our clients, but we get really poor performance out of it when compared to async commits + callbacks.

Attached client logs with “debug: all” for sync and async versions. sync_commit.txt async_commit.txt

How to reproduce

It looks like the commit request itself is quite quick but between the request being enqueued and sent to the broker it can take a while (~1sec) and I can see quite a few FetchRequests in between commits even though the flow of our consumer is something like:

private Message<string, byte[]> ConsumeMessageSync()
{
    Message<string, byte[]> kafkaMessage;
    _consumer.Consume(out kafkaMessage, 100);  
    return kafkaMessage;
}

var msg = ConsumeMessageSync();
var clientReadyMsg = process(msg);
emitMessageToClient(clientReadyMsg);

then client subscribes and commits after each emission...

What I don’t understand is why fetch requests are issued to the broker after the commit request is enqueued and while we wait for the commit result to come back. I played around with fetch.wait.max.ms but that just changes the amount of fetch requests that gets sent in between.

Additionally there are some weird PROTOERR level messages like this:

7|2018-02-06 12:01:01.765|rdkafka#consumer-1|PROTOERR| [thrd:lonrs08346.my-domain.net:2182/bootstrap]: lonrs08346.my-domain.net:2182/2: Protocol parse failure at 1048332/1048648 (rd_kafka_msgset_reader_msg_v0_1:464) (incorrect broker.version.fallback?)

Probably not related but worth pointing out. Is there something I am missing? Thanks in advance!

Checklist

Please provide the following information:

Confluent.Kafka nuget version: 0.11.3
Apache Kafka version: 0.10.0.1
Client configuration:
{“enable.auto.commit”, “false”}, {“auto.offset.reset”, “earliest”}
Operating system: Win7x64
Provide logs (with “debug” : “…” as necessary in configuration)
Provide broker log excerpts
Critical issue

sync_commit.txt async_commit.txt

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:40 (20 by maintainers)

Top GitHub Comments

2reactions

edenhillcommented, Apr 27, 2018

Thank you all for your patience.

I’ve now identified the issue: https://github.com/edenhill/librdkafka/blob/master/src/rdkafka_broker.c#L3187

When committing to a broker that we’re not fetching messages from there is a high probability that queued ops (such as a Commit) will be delayed up to 1000ms before being sent, regardless of socket.blocking.max.ms.

I have a fix in place which I’ll test and then commit to master.

There is no workaround.

librdkafka issue: https://github.com/edenhill/librdkafka/issues/1787

2reactions

GarrettDaviscommented, Apr 9, 2018

@mhowlett I would suggest making it a just synchronous call because that’s what it is. I would leave it up to the consumers of this library whether or not they want to wrap it in a Task.Run() or offload it onto another thread. In order to avoid making this a breaking change, you could implement a CommitAsync(this Consumer consumer, ...) extension method that wraps the synchronous Consumer.Commit() in a Task.Run(). Of course, since it’s a major release, it’s OK to make a breaking change.