[Core][Bug] gRPC poller error message showed up in the logs
See original GitHub issueSearch before asking
- I searched the issues and found no similar issues.
Ray Component
Ray Core
What happened + What you expected to happen
Our user is experiencing following errors in the worker log related grpc.
([...] pid=15697, ip=10.120.158.253) E1129 03:37:56.802084512 18007 backup_poller.cc:134] Run client channel backup poller: {"created":"@1638157076.802038566","description":"pollset_work","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":320,"referenced_errors":[{"created":"@1638157076.802034167","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":950,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
([...] pid=58163, ip=10.123.160.166) E1129 03:37:56.747796001 60484 backup_poller.cc:134] Run client channel backup poller: {"created":"@1638157076.747742130","description":"pollset_work","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":320,"referenced_errors":[{"created":"@1638157076.747735658","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":950,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
more context https://anyscaleteam.slack.com/archives/C027L220V0V/p1638157108298400
Versions / Dependencies
latest ray
Reproduction script
N/A
Anything else
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
Update to gRPC logs GOAWAY with error code ... - GitLab
Why am I receiving a GOAWAY with error code ENHANCE_YOUR_CALM? A server sends a GOAWAY with ENHANCE_YOUR_CALM if the client sends too many ......
Read more >Logging and diagnostics in gRPC on .NET - Microsoft Learn
This article provides guidance for gathering diagnostics from a gRPC app to help troubleshoot issues. Topics covered include: Logging ...
Read more >Exception with Activity Poller - Community Support - Temporal
From the error message, it looks like the worker is still polling the task queues even thou the workflow is complete. Is there...
Read more >gRPC C++: client fails to connect to server - Google Groups
Error code : 14 Err msg: Endpoint read failed. TL;DR Server starts fine. Netstat shows it is listening on the port. Client fails...
Read more >Troubleshooting Response Errors | Cloud Endpoints with gRPC
If you receive error code 14 and the message upstream backend unavailable ... To use the Cloud Logging logs to help troubleshoot response...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I don’t have much context here yet. The next step seems to be to find a repro. cc @rkooo567
I’m experimenting a similar issue with Ray v.1.13.0. On my local computer, everything is working correctly but when executing my code which uses trainer.train() on a cluster (with slurm), the training takes more and more time at each iteration and I get this kind of message at some iterations:
E0402 17:39:02.274615934 3860901 backup_poller.cc:134] Run client channel backup poller: {“created”:“@1648921142.274527595”,“description”:“pollset_work”,“file”:“src/core/lib/iomgr/ev_epollex_linux.cc”,“file_line”:320,“referenced_errors”:[{“created”:“@1648921142.274524502”,“description”:“Bad file descriptor”,“errno”:9,“file”:“src/core/lib/iomgr/ev_epollex_linux.cc”,“file_line”:950,“os_error”:“Bad file descriptor”,“syscall”:“epoll_wait”}]}