question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue when a broker is down

See original GitHub issue

Hi,

My setup is as follows, I have two instances of Kafka running in docker containers, a producer and a consumer script. When I kill the first broker and restart the producer / consumer script they timeout and “ready” event is never emitted. {"message":"Local: Timed out","code":-185,"errno":-185,"origin":"kafka"}

The issue is that when calling rd_kafka_metadata, there is a moment when both brokers’ state is NOT up and the function that decides which broker to talk to, rd_kafka_broker_any, returns the first broker, rd_kafka_metadata returns a timeout error (because that broker is down). In the JS code’s connect method we return and never emit “ready” event.

This error seems to be fixed on librdkafka’s side in the current master version as rd_kafka_metadata’s function was partly rewritten.

Did you observer this behavior during your tests or am I doing something wrong?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
ArtemisMucajcommented, Sep 19, 2016

Hey,

I just stumbled upon a new case, not sure if this is related at all to this issue but it kinda happens in the same context so I didn’t want to create a new issue!

  1. Provide node-rdkafka two brokers in a comma separeted list. Both are up before startup
  2. Call .connect, ready event is called
  3. Shutdown both brokers, in some cases an unassignment event is triggered right after {"origin":"local","message":"all broker connections are down","code":-1,"errno":-1,"stack":"Error: Local: All broker connections are down\n at Error (native)"} error message
  4. Restart a / both brokers, if unassignment event is emitted we never receive a reassignment event, otherwise everything works just fine!

This is a real bummer as I need to restart my consumer script by hand / shutdown when I get an “all brokers are down” error!

Any advice ?

1reaction
webmakerstevecommented, Sep 15, 2016

Ready event is for JS level connections only. It will only ever get emitted one time - that is, when you make the initial connection, unless I’m misunderstanding what you’re trying to test.

Librdkafka manages connections internally. Getting a time out is not a non-recoverable error and it will do its best to reconnect you. It batches and queues messages so you can continue writing with no problem, and when Kafka reconnects it will send off what you gave it, or continue reading if you are producing.

What .connect does, on the node side, is just tell librdkafka to go ahead and open a socket. .disconnect tells it to close the socket and stop working.

It’s a little hard to follow exactly what you’re doing to get this to happen though. Can you give me a sequential list so I can see whether it’s working as intended, a bug in the node.js code, or a bug in librdkafka?

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Interactive Brokers down? Current outages and problems
Real-time outages for Interactive Brokers. Is the site down? Can't log in to your account and trade equities? Here you see what is...
Read more >
Kafka producer do not work when one broker is down #327
Description I have started two kafka broker , and set config["bootstrap.servers"] = "host1:port,host2:port" . But kafka producer do not work ...
Read more >
Solved: Impact of Kafka broker down - Cloudera Community
p0.18 We have 5 Kafka brokers on production cluster. One of the broker is unavailable due to host issue. Is there any impact...
Read more >
apache kafka - what happens after a broker is down in a cluster?
Kafka does not create a new replica when a broker goes down. If the offline broker was a leader, a new leader is...
Read more >
Kafka Topic Configuration: Minimum In-Sync Replicas
min.insync.replicas=1 (default): the topic must have at least 1 partition up as an ISR (that includes the reader) and so we can tolerate...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found