Issue when a broker is down
See original GitHub issueHi,
My setup is as follows, I have two instances of Kafka running in docker containers, a producer and a consumer script.
When I kill the first broker and restart the producer / consumer script they timeout and “ready” event is never emitted.
{"message":"Local: Timed out","code":-185,"errno":-185,"origin":"kafka"}
The issue is that when calling rd_kafka_metadata
, there is a moment when both brokers’ state
is NOT up and the function that decides which broker to talk to, rd_kafka_broker_any
, returns the first broker, rd_kafka_metadata
returns a timeout error (because that broker is down).
In the JS code’s connect
method we return and never emit “ready” event.
This error seems to be fixed on librdkafka’s side in the current master version as rd_kafka_metadata
’s function was partly rewritten.
Did you observer this behavior during your tests or am I doing something wrong?
Issue Analytics
- State:
- Created 7 years ago
- Comments:12 (5 by maintainers)
Top GitHub Comments
Hey,
I just stumbled upon a new case, not sure if this is related at all to this issue but it kinda happens in the same context so I didn’t want to create a new issue!
{"origin":"local","message":"all broker connections are down","code":-1,"errno":-1,"stack":"Error: Local: All broker connections are down\n at Error (native)"}
error messageThis is a real bummer as I need to restart my consumer script by hand / shutdown when I get an “all brokers are down” error!
Any advice ?
Ready event is for JS level connections only. It will only ever get emitted one time - that is, when you make the initial connection, unless I’m misunderstanding what you’re trying to test.
Librdkafka
manages connections internally. Getting a time out is not a non-recoverable error and it will do its best to reconnect you. It batches and queues messages so you can continue writing with no problem, and when Kafka reconnects it will send off what you gave it, or continue reading if you are producing.What
.connect
does, on the node side, is just telllibrdkafka
to go ahead and open a socket..disconnect
tells it to close the socket and stop working.It’s a little hard to follow exactly what you’re doing to get this to happen though. Can you give me a sequential list so I can see whether it’s working as intended, a bug in the node.js code, or a bug in librdkafka?
Thanks!