Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reliability use case

See original GitHub issue

Description

I am using ProduceAsync method to produce the messages. Now to get the better reliability i have created topic with following configuration

.\bin\windows\kafka-topics.bat -zookeeper localhost:2181 --create --topic Account --partitions 3 --replication-factor 3 --config min.insync.replicas=2

I am setting producer properties are as:

var config = new Dictionary<string, object> { { “bootstrap.servers”, brokerList }, { “batch.num.messages”, 1 }, { “request.timeout.ms”,2000},{“queue.buffering.max.messages”,100 }, { “default.topic.config”, new Dictionary<string, object> { { “acks”, “all” } } } }; My Producer code look like ->

` using (var producer = new Producer<String,Account>(config, new StringSerializer(Encoding.UTF32), new TypeSerializer<Account>())) { Console.WriteLine(“\n-----------------------------------------------------------------------”); Console.WriteLine($“Producer {producer.Name} producing on topic {topicName}.”); Console.WriteLine(“-----------------------------------------------------------------------”); Console.WriteLine(“To create a kafka message with UTF-8 encoded key/value message:”); Console.WriteLine(“> key value<Enter>”); Console.WriteLine(“To create a kafka message with empty key and UTF-8 encoded value:”); Console.WriteLine(“> value<enter>”); Console.WriteLine(“Ctrl-C to quit.\n”);

                var cancelled = false;
                Console.CancelKeyPress += (_, e) =>
                {
                    e.Cancel = true; // prevent the process from terminating.
                    cancelled = true;
                };
                int acct = 0;
                DateTime time = DateTime.Now;
                Log("Start Time" + time , logfile);
                while (acct <= 500)
                {
                    Console.Write("> ");

                    Account text = new Account();
                    try
                    {

                        text.AccountNumber =  acct.ToString();
                        text.age = "34";
                        text.FirstName = "Test first name  ";
                        text.Lastname = "Test last name";                                                       

                        
                        acct = acct + 1;
                    }
                    catch (IOException)
                    {
                        // IO exception is thrown when ConsoleCancelEventArgs.Cancel == true.
                        break;
                    }
                    if (text == null)
                    {
                        // Console returned null before 
                        // the CancelKeyPress was treated
                        break;
                    }                        
                    var val = text;
                    try
                    {
                        //int ptnum= GetPartition();
                        var dh = new ResultHandler();                          

                         producer.ProduceAsync(topicName,acct.ToString() , val,true, dh);
                       
                        producer.Flush(-1);
                        
                        var Result = dh.GetMessage();

                        Console.WriteLine($"Key: {acct.ToString()} ,Partition: {Result.Partition}, Offset: {Result.Offset},account number : { Result.Value.AccountNumber}, Status: { Result.Error}, Message Numner ;{ acct }");
                        Log("Msg Details:" + $"Key: {acct.ToString()},Status {Result.Error},Partition: {Result.Partition}, Account Number: {Result.Value.AccountNumber.ToString()},Offset : { Result.Offset}", logfile);

                       
                    }
                    catch (Exception ex)
                    {
                        Log("Excption"+ex, logfile);
                    }
                }
                Log("End Time" + DateTime.Now.ToString(), logfile);
                Log("End Time" + (time-DateTime.Now).ToString(), logfile);
              
            }

` Questions:- 1> I am always getting acknowledgment message status as “Success” (i.e value of Result.Error object) I tried different way to create error by shutting brokers but never received the Error object filled with error details.How I will receive the error details. 2> I am using producer.Flush(-1) is that correct way ? wanted to know, is there any way to increase throughput. OR how we can increase performance with reliability?

How to reproduce

Checklist

Please provide the following information:

Confluent.Kafka nuget version:Confluent.Kafka.0.11.0.nupkg.sha512
Apache Kafka version: kafka_2.11-0.11.0.0
Client configuration:
Operating system: Windows 7
Provide logs (with “debug” : “…” as necessary in configuration)
Provide broker log excerpts
Critical issue

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:11 (6 by maintainers)

Top GitHub Comments

1reaction

treziaccommented, Aug 22, 2017

What do you mean exactly by reliability? Messages all acked, log properly failed messages, messages in order? Some things from your code:

see http://docs.confluent.io/current/clients/producer.html and search the part about ordering. You don’t necesseraly need a sync producer to achieve ordering (just make sure you don’t have more than one in flight request at a time), but it really depends on your needs. It seems broken at the time being though (https://github.com/edenhill/librdkafka/issues/1092), so for now continue using your sync producer
if you use a deliveryhandler, you may want to use a single deliveryhandler for all produceAsync, not creating one each time. Also, could you post your code for ResultHandler ? I don’t see what you GetMessage will do, so cannot explain why you always get success
Did you try shutting down all broker to get error, for more than five minutes? By default, librdkafka will retry to send failure. If you want to maximize ‘chances’ of receiving failures, set retries to 0 and message.timeout.ms to 10000 for example,
I would discourage calling Flush(-1) : if there are some network issues, this will block until all your messages timeout, which is by default 5min. Call Fluh(TimeSpan.FromSeconds(10)) for example, and check that result is 0 (it returns the number of messages which were net send yet), and log if necessary. You can call flush in a while loop if you want - this will have same result, but at least you can monitor it
request.timeout.ms is the time to wait ack once data has been send to broker, not the time to wait to send data (which is message.timeout.ms). So if brokers are down, you will wait at least this time
What are you trying to achieve with batch.num.messages to 1? I assume you want to send the message right away, then it’s ok - but you can also use queue.buffering.max.ms and set it to 1.
set socket.blocking.max.ms to 1, or you may wait up to 1s between each request
as you are on windows, disable nagle y setting socket.nagle.disable to true
newly ExactlyOnce semantic would help you, but it’s not available yet on librdkafka. It’s on roadmap!
If you want full reliability, you may also want to check for CRC32 (check.crcs)

0reactions

mhowlettcommented, Jul 31, 2018

We plan to make an in-depth example + docs highlighting reliability features. Tracking that task internally.

Top Results From Across the Web

Introduction to Use Case 7 concept

A reliability statement means absolutely NOTHING without a clearly defined use case attached to it. Advertising your plastic fork has a 99.999% ...

What is SRE (Site Reliability Engineering ) and use cases

SRE stands for Site Reliability Engineering, which is a software engineering discipline that focuses on the reliability and maintainability ...

30 Case Studies Every Reliability Professional Should Read

Harley-Davidson: Born to be … Frito-Lay Chips Away at Asset Care Goals Simmons Feed Cuts Downtime by 50 Percent We Energies Sees the...

Reliability engineering

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure.

Best Practices for the SRE: Use Cases for Automation

Site Reliability Engineers often use automation and orchestration capabilities to scale security and performance, ensuring sites are reliable and efficient.