question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

poll() never returns - "socket disconnected" message in logs

See original GitHub issue

I keep running into issues where a running consumer no longer completes calls to poll() and kafka-python indefinitely repeats log messages like the ones below.

I suspect (without much evidence, honestly) that something like a slight disruption in the brokers or perhaps in general network connectivity kicks off the issue. However, regardless of the cause, existing consumers never recover and have to be forcibly killed and restarted. This is made more difficult by the fact that poll() never returns, and so I cannot programmatically kill the offending consumer thread.

If I could venture a theory, it seems as though something inside poll() gets into an infinite loop trying to recover connectivity.

2016-11-17 14:16:52,342 [INFO]: <BrokerConnection host=kafka01-prod01.messagehub.services.us-south.bluemix.net/23.246.202.51 port=9093>: Authenticated as XXXXX
2016-11-17 14:16:52,352 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,352 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,387 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,388 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,437 [ERROR]: <BrokerConnection host=kafka01-prod01.messagehub.services.us-south.bluemix.net/23.246.202.51 port=9093>: socket disconnected
2016-11-17 14:16:52,438 [WARNING]: Node 0 connection failed -- refreshing metadata
2016-11-17 14:16:52,438 [ERROR]: Error sending GroupCoordinatorRequest_v0 to node 0 [NodeNotReadyError: 0]
2016-11-17 14:16:52,447 [INFO]: <BrokerConnection host=kafka01-prod01.messagehub.services.us-south.bluemix.net/23.246.202.51 port=9093>: Authenticated as XXXXX
2016-11-17 14:16:52,452 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,452 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,488 [ERROR]: <BrokerConnection host=kafka05-prod01.messagehub.services.us-south.bluemix.net/23.246.202.55 port=9093>: socket disconnected
2016-11-17 14:16:52,488 [WARNING]: Node 4 connection failed -- refreshing metadata
2016-11-17 14:16:52,488 [ERROR]: Fetch to node 4 failed: ConnectionError: socket disconnected
2016-11-17 14:16:52,488 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,489 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,497 [INFO]: <BrokerConnection host=kafka05-prod01.messagehub.services.us-south.bluemix.net/23.246.202.55 port=9093>: Authenticated as XXXXX
2016-11-17 14:16:52,537 [ERROR]: <BrokerConnection host=kafka05-prod01.messagehub.services.us-south.bluemix.net/23.246.202.55 port=9093>: socket disconnected
2016-11-17 14:16:52,537 [WARNING]: Node 4 connection failed -- refreshing metadata
2016-11-17 14:16:52,538 [ERROR]: Error sending GroupCoordinatorRequest_v0 to node 4 [NodeNotReadyError: 4]
2016-11-17 14:16:52,546 [INFO]: <BrokerConnection host=kafka05-prod01.messagehub.services.us-south.bluemix.net/23.246.202.55 port=9093>: Authenticated as XXXXX
2016-11-17 14:16:52,553 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,553 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,589 [WARNING]: Coordinator unknown during heartbeat -- will retry
2016-11-17 14:16:52,589 [WARNING]: Heartbeat failed ([Error 15] GroupCoordinatorNotAvailableError: ); retrying
2016-11-17 14:16:52,600 [ERROR]: <BrokerConnection host=kafka02-prod01.messagehub.services.us-south.bluemix.net/23.246.202.52 port=9093>: socket disconnected
2016-11-17 14:16:52,601 [WARNING]: Node 1 connection failed -- refreshing metadata
2016-11-17 14:16:52,601 [WARNING]: Marking the coordinator dead (node 1) for group /XXXXXX@us.ibm.com_dev/messageHubTrigger: None.
2016-11-17 14:16:52,639 [ERROR]: <BrokerConnection host=kafka01-prod01.messagehub.services.us-south.bluemix.net/23.246.202.51 port=9093>: socket disconnected
2016-11-17 14:16:52,639 [WARNING]: Node 0 connection failed -- refreshing metadata
2016-11-17 14:16:52,640 [ERROR]: Error sending GroupCoordinatorRequest_v0 to node 3 [NodeNotReadyError: 3]

Please note that the logs above may include interleaved output of several consumers running in separate threads.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
dpkpcommented, Mar 3, 2017

Thanks to declantraynor for an awesome repro setup. Fix in PR #1003

1reaction
declantraynorcommented, Feb 22, 2017

I have also encountered this issue. One detail I can add is that the issue only manifests when the consumer has an authenticated connection to Kafka (providing a SASL username and password, the connection does not necessarily have to be secure). With this extra information, I have been able to put together a small test case that reliably reproduces the issue.

While the example uses a containerised Kafka, it should be noted that I have also reproduced this on a production Kafka cluster.

I hope this is helpful in further debugging the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Socket.poll don't return true - Stack Overflow
In C#, socket.poll sometimes returns false. However Winshark indicates that there are packets. I found some answers, but nothing definitive.
Read more >
Troubleshooting connection issues | Socket.IO
First and foremost, please note that disconnections are common and expected, even on a stable Internet connection:
Read more >
select — Waiting for I/O completion — Python 3.11.1 ...
This module provides access to the select() and poll() functions available ... Note that on Windows, it only works for sockets; on other...
Read more >
lib.rs.html -- source - Docs.rs
It uses a polling API for receiving messages, so it's probably most //! suitable for games ... return socket; } else { console.log("Unable...
Read more >
26. WebSocket Support - Spring
Spring Framework 4 includes a new spring-messaging module with key abstractions ... @Bean public WebSocketHandler myHandler() { return new MyHandler(); } }.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found