question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reliability use case

See original GitHub issue

Description

I am using ProduceAsync method to produce the messages. Now to get the better reliability i have created topic with following configuration

.\bin\windows\kafka-topics.bat -zookeeper localhost:2181 --create --topic Account --partitions 3 --replication-factor 3 --config min.insync.replicas=2

I am setting producer properties are as:

var config = new Dictionary<string, object> { { “bootstrap.servers”, brokerList }, { “batch.num.messages”, 1 }, { “request.timeout.ms”,2000},{“queue.buffering.max.messages”,100 }, { “default.topic.config”, new Dictionary<string, object> { { “acks”, “all” } } } }; My Producer code look like ->

` using (var producer = new Producer<String,Account>(config, new StringSerializer(Encoding.UTF32), new TypeSerializer<Account>())) { Console.WriteLine(“\n-----------------------------------------------------------------------”); Console.WriteLine($“Producer {producer.Name} producing on topic {topicName}.”); Console.WriteLine(“-----------------------------------------------------------------------”); Console.WriteLine(“To create a kafka message with UTF-8 encoded key/value message:”); Console.WriteLine(“> key value<Enter>”); Console.WriteLine(“To create a kafka message with empty key and UTF-8 encoded value:”); Console.WriteLine(“> value<enter>”); Console.WriteLine(“Ctrl-C to quit.\n”);

                var cancelled = false;
                Console.CancelKeyPress += (_, e) =>
                {
                    e.Cancel = true; // prevent the process from terminating.
                    cancelled = true;
                };
                int acct = 0;
                DateTime time = DateTime.Now;
                Log("Start Time" + time , logfile);
                while (acct <= 500)
                {
                    Console.Write("> ");

                    Account text = new Account();
                    try
                    {

                        text.AccountNumber =  acct.ToString();
                        text.age = "34";
                        text.FirstName = "Test first name  ";
                        text.Lastname = "Test last name";                                                       

                        
                        acct = acct + 1;
                    }
                    catch (IOException)
                    {
                        // IO exception is thrown when ConsoleCancelEventArgs.Cancel == true.
                        break;
                    }
                    if (text == null)
                    {
                        // Console returned null before 
                        // the CancelKeyPress was treated
                        break;
                    }                        
                    var val = text;
                    try
                    {
                        //int ptnum= GetPartition();
                        var dh = new ResultHandler();                          

                         producer.ProduceAsync(topicName,acct.ToString() , val,true, dh);
                       
                        producer.Flush(-1);
                        
                        var Result = dh.GetMessage();

                        Console.WriteLine($"Key: {acct.ToString()} ,Partition: {Result.Partition}, Offset: {Result.Offset},account number : { Result.Value.AccountNumber}, Status: { Result.Error}, Message Numner ;{ acct }");
                        Log("Msg Details:" + $"Key: {acct.ToString()},Status {Result.Error},Partition: {Result.Partition}, Account Number: {Result.Value.AccountNumber.ToString()},Offset : { Result.Offset}", logfile);

                       
                    }
                    catch (Exception ex)
                    {
                        Log("Excption"+ex, logfile);
                    }
                }
                Log("End Time" + DateTime.Now.ToString(), logfile);
                Log("End Time" + (time-DateTime.Now).ToString(), logfile);
              
            }

` Questions:- 1> I am always getting acknowledgment message status as “Success” (i.e value of Result.Error object) I tried different way to create error by shutting brokers but never received the Error object filled with error details.How I will receive the error details. 2> I am using producer.Flush(-1) is that correct way ? wanted to know, is there any way to increase throughput. OR how we can increase performance with reliability?

How to reproduce

Checklist

Please provide the following information:

  • Confluent.Kafka nuget version:Confluent.Kafka.0.11.0.nupkg.sha512
  • Apache Kafka version: kafka_2.11-0.11.0.0
  • Client configuration:
  • Operating system: Windows 7
  • Provide logs (with “debug” : “…” as necessary in configuration)
  • Provide broker log excerpts
  • Critical issue

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
treziaccommented, Aug 22, 2017

What do you mean exactly by reliability? Messages all acked, log properly failed messages, messages in order? Some things from your code:

  • see http://docs.confluent.io/current/clients/producer.html and search the part about ordering. You don’t necesseraly need a sync producer to achieve ordering (just make sure you don’t have more than one in flight request at a time), but it really depends on your needs. It seems broken at the time being though (https://github.com/edenhill/librdkafka/issues/1092), so for now continue using your sync producer

  • if you use a deliveryhandler, you may want to use a single deliveryhandler for all produceAsync, not creating one each time. Also, could you post your code for ResultHandler ? I don’t see what you GetMessage will do, so cannot explain why you always get success

  • Did you try shutting down all broker to get error, for more than five minutes? By default, librdkafka will retry to send failure. If you want to maximize ‘chances’ of receiving failures, set retries to 0 and message.timeout.ms to 10000 for example,

  • I would discourage calling Flush(-1) : if there are some network issues, this will block until all your messages timeout, which is by default 5min. Call Fluh(TimeSpan.FromSeconds(10)) for example, and check that result is 0 (it returns the number of messages which were net send yet), and log if necessary. You can call flush in a while loop if you want - this will have same result, but at least you can monitor it

  • request.timeout.ms is the time to wait ack once data has been send to broker, not the time to wait to send data (which is message.timeout.ms). So if brokers are down, you will wait at least this time

  • What are you trying to achieve with batch.num.messages to 1? I assume you want to send the message right away, then it’s ok - but you can also use queue.buffering.max.ms and set it to 1.

  • set socket.blocking.max.ms to 1, or you may wait up to 1s between each request

  • as you are on windows, disable nagle y setting socket.nagle.disable to true

  • newly ExactlyOnce semantic would help you, but it’s not available yet on librdkafka. It’s on roadmap!

  • If you want full reliability, you may also want to check for CRC32 (check.crcs)

0reactions
mhowlettcommented, Jul 31, 2018

We plan to make an in-depth example + docs highlighting reliability features. Tracking that task internally.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introduction to Use Case 7 concept
A reliability statement means absolutely NOTHING without a clearly defined use case attached to it. Advertising your plastic fork has a 99.999% ...
Read more >
What is SRE (Site Reliability Engineering ) and use cases
SRE stands for Site Reliability Engineering, which is a software engineering discipline that focuses on the reliability and maintainability ...
Read more >
30 Case Studies Every Reliability Professional Should Read
Harley-Davidson: Born to be … Frito-Lay Chips Away at Asset Care Goals Simmons Feed Cuts Downtime by 50 Percent We Energies Sees the...
Read more >
Reliability engineering
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure.
Read more >
Best Practices for the SRE: Use Cases for Automation
Site Reliability Engineers often use automation and orchestration capabilities to scale security and performance, ensuring sites are reliable and efficient.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found