[BUG] [FlyteAdmin] Notifications SQS subscriber stops processing messages when "connection reset by peer"
See original GitHub issueDescribe the bug Notifications SQS subscriber stopped process messages
Expected behavior Gracefully reconnecting if the application is running
Flyte component
- Overall
- Flyte Setup and Installation scripts
- Flyte Documentation
- Flyte communication (slack/email etc)
- FlytePropeller
- FlyteIDL (Flyte specification language)
- Flytekit (Python SDK)
- FlyteAdmin (Control Plane service)
- FlytePlugins
- DataCatalog
- FlyteStdlib (common libraries)
- FlyteConsole (UI)
- Other
To Reproduce Steps to reproduce the behavior:
- Run flyte with enabled notifications
- Wait for this happens
Environment Flyte component
- Sandbox (local or on one machine)
- Cloud hosted
- AWS
- GCP
- Azure
- Baremetal
- Other
Additional context Logs:
{"json":{"src":"base.go:103"},"level":"error","msg":"error with starting processor err: [RequestError: send request failed\ncaused by: Post https://sqs.us-east-1.amazonaws.com/: read tcp 10.200.8.116:59882-\u003e52.46.137.144:443: read: connection reset by peer] ","ts":"2020-07-03T10:35:10Z"}
{"json":{"src":"processor.go:113"},"level":"warning","msg":"The stream for the subscriber channel closed with err: RequestError: send request failed\ncaused by: Post https://sqs.us-east-1.amazonaws.com/: read tcp 10.200.8.116:59882-\u003e52.46.137.144:443: read: connection reset by peer","ts":"2020-07-03T10:35:10Z"}
I guess solution will be similar to this one: https://github.com/lyft/flyteadmin/pull/92
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@rstanevich I think we found a very good way of solving this problem. @katrogan will merge the PR soon. Thank you for raising the issue.
it is merged and will be part of the next release