question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Core][Bug] gRPC poller error message showed up in the logs

See original GitHub issue

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Core

What happened + What you expected to happen

Our user is experiencing following errors in the worker log related grpc.

([...] pid=15697, ip=10.120.158.253) E1129 03:37:56.802084512   18007 backup_poller.cc:134]       Run client channel backup poller: {"created":"@1638157076.802038566","description":"pollset_work","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":320,"referenced_errors":[{"created":"@1638157076.802034167","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":950,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}
([...] pid=58163, ip=10.123.160.166) E1129 03:37:56.747796001   60484 backup_poller.cc:134]       Run client channel backup poller: {"created":"@1638157076.747742130","description":"pollset_work","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":320,"referenced_errors":[{"created":"@1638157076.747735658","description":"Bad file descriptor","errno":9,"file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":950,"os_error":"Bad file descriptor","syscall":"epoll_wait"}]}

more context https://anyscaleteam.slack.com/archives/C027L220V0V/p1638157108298400

Versions / Dependencies

latest ray

Reproduction script

N/A

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mwtiancommented, Jan 21, 2022

I don’t have much context here yet. The next step seems to be to find a repro. cc @rkooo567

0reactions
sarahperrincommented, Jul 25, 2022

I’m experimenting a similar issue with Ray v.1.13.0. On my local computer, everything is working correctly but when executing my code which uses trainer.train() on a cluster (with slurm), the training takes more and more time at each iteration and I get this kind of message at some iterations:

E0402 17:39:02.274615934 3860901 backup_poller.cc:134] Run client channel backup poller: {“created”:“@1648921142.274527595”,“description”:“pollset_work”,“file”:“src/core/lib/iomgr/ev_epollex_linux.cc”,“file_line”:320,“referenced_errors”:[{“created”:“@1648921142.274524502”,“description”:“Bad file descriptor”,“errno”:9,“file”:“src/core/lib/iomgr/ev_epollex_linux.cc”,“file_line”:950,“os_error”:“Bad file descriptor”,“syscall”:“epoll_wait”}]}

Read more comments on GitHub >

github_iconTop Results From Across the Web

Update to gRPC logs GOAWAY with error code ... - GitLab
Why am I receiving a GOAWAY with error code ENHANCE_YOUR_CALM? A server sends a GOAWAY with ENHANCE_YOUR_CALM if the client sends too many ......
Read more >
Logging and diagnostics in gRPC on .NET - Microsoft Learn
This article provides guidance for gathering diagnostics from a gRPC app to help troubleshoot issues. Topics covered include: Logging ...
Read more >
Exception with Activity Poller - Community Support - Temporal
From the error message, it looks like the worker is still polling the task queues even thou the workflow is complete. Is there...
Read more >
gRPC C++: client fails to connect to server - Google Groups
Error code : 14 Err msg: Endpoint read failed. TL;DR Server starts fine. Netstat shows it is listening on the port. Client fails...
Read more >
Troubleshooting Response Errors | Cloud Endpoints with gRPC
If you receive error code 14 and the message upstream backend unavailable ... To use the Cloud Logging logs to help troubleshoot response...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found