poll() never returns - "socket disconnected" message in logs
See original GitHub issueI keep running into issues where a running consumer no longer completes calls to poll()
and kafka-python indefinitely repeats log messages like the ones below.
I suspect (without much evidence, honestly) that something like a slight disruption in the brokers or perhaps in general network connectivity kicks off the issue. However, regardless of the cause, existing consumers never recover and have to be forcibly killed and restarted. This is made more difficult by the fact that poll()
never returns, and so I cannot programmatically kill the offending consumer thread.
If I could venture a theory, it seems as though something inside poll()
gets into an infinite loop trying to recover connectivity.
2016-11-17 14:16:52,342 [INFO]: <BrokerConnection host=kafka01-prod01.messagehub.services.us-south.bluemix.net/23.246.202.51 port=9093>: Authenticated as XXXXX
2016-11-17 14:16:52,352 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,352 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,387 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,388 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,437 [ERROR]: <BrokerConnection host=kafka01-prod01.messagehub.services.us-south.bluemix.net/23.246.202.51 port=9093>: socket disconnected
2016-11-17 14:16:52,438 [WARNING]: Node 0 connection failed -- refreshing metadata
2016-11-17 14:16:52,438 [ERROR]: Error sending GroupCoordinatorRequest_v0 to node 0 [NodeNotReadyError: 0]
2016-11-17 14:16:52,447 [INFO]: <BrokerConnection host=kafka01-prod01.messagehub.services.us-south.bluemix.net/23.246.202.51 port=9093>: Authenticated as XXXXX
2016-11-17 14:16:52,452 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,452 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,488 [ERROR]: <BrokerConnection host=kafka05-prod01.messagehub.services.us-south.bluemix.net/23.246.202.55 port=9093>: socket disconnected
2016-11-17 14:16:52,488 [WARNING]: Node 4 connection failed -- refreshing metadata
2016-11-17 14:16:52,488 [ERROR]: Fetch to node 4 failed: ConnectionError: socket disconnected
2016-11-17 14:16:52,488 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,489 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,497 [INFO]: <BrokerConnection host=kafka05-prod01.messagehub.services.us-south.bluemix.net/23.246.202.55 port=9093>: Authenticated as XXXXX
2016-11-17 14:16:52,537 [ERROR]: <BrokerConnection host=kafka05-prod01.messagehub.services.us-south.bluemix.net/23.246.202.55 port=9093>: socket disconnected
2016-11-17 14:16:52,537 [WARNING]: Node 4 connection failed -- refreshing metadata
2016-11-17 14:16:52,538 [ERROR]: Error sending GroupCoordinatorRequest_v0 to node 4 [NodeNotReadyError: 4]
2016-11-17 14:16:52,546 [INFO]: <BrokerConnection host=kafka05-prod01.messagehub.services.us-south.bluemix.net/23.246.202.55 port=9093>: Authenticated as XXXXX
2016-11-17 14:16:52,553 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,553 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,589 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,589 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,600 [ERROR]: <BrokerConnection host=kafka02-prod01.messagehub.services.us-south.bluemix.net/23.246.202.52 port=9093>: socket disconnected
2016-11-17 14:16:52,601 [WARNING]: Node 1 connection failed -- refreshing metadata
2016-11-17 14:16:52,601 [WARNING]: Marking the coordinator dead (node 1) for group /XXXXXX@us.ibm.com_dev/messageHubTrigger: None.
2016-11-17 14:16:52,639 [ERROR]: <BrokerConnection host=kafka01-prod01.messagehub.services.us-south.bluemix.net/23.246.202.51 port=9093>: socket disconnected
2016-11-17 14:16:52,639 [WARNING]: Node 0 connection failed -- refreshing metadata
2016-11-17 14:16:52,640 [ERROR]: Error sending GroupCoordinatorRequest_v0 to node 3 [NodeNotReadyError: 3]
Please note that the logs above may include interleaved output of several consumers running in separate threads.
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (3 by maintainers)
Thanks to declantraynor for an awesome repro setup. Fix in PR #1003
I have also encountered this issue. One detail I can add is that the issue only manifests when the consumer has an authenticated connection to Kafka (providing a SASL username and password, the connection does not necessarily have to be secure). With this extra information, I have been able to put together a small test case that reliably reproduces the issue.
While the example uses a containerised Kafka, it should be noted that I have also reproduced this on a production Kafka cluster.
I hope this is helpful in further debugging the issue.