Production - [Alerting] DotNetEng Status Failed Requests/Hour alert
See original GitHub issue💔 Metric state changed to alerting
The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated. To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:
union exceptions, traces | project timestamp, operation_Name, customDimensions, message, problemId, details | order by timestamp asc
- failuresCount 22
@dotnet/dnceng, please investigate
Automation information below, do not change
Grafana-Automated-Alert-Id-d2dd705a6c724ed68fcf6955561c06dd
Issue Analytics
- State:
- Created 6 months ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Request Error Percent Alerts
We are experiencing 'Request Error Percent' alerts that are not being cleared from the Open Alerts filter. It has occurred numerous times alongside...
Read more >Monitor alerts history
Within the Redox dashboard, you can monitor the alerts we send about an endpoint not receiving the expected patient data. Monitoring alerts can...
Read more >Alerting overview | Cloud Monitoring
Alerting gives timely awareness to problems in your cloud applications so you can resolve the problems quickly. In Cloud Monitoring, an alerting policy ......
Read more >CI is Alerting
For instance, a poorly-thought out or insufficiently-predictive alert triggering in production can unnecessarily indicate a bad rollout, slowing ...
Read more >Troubleshooting Azure Monitor alerts and notifications
Common issues with Azure Monitor alerts and possible solutions. ... (succeeded or failed), by not filtering on the event status field.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I tested the failing codepath’s scenario (editing a pull request title or body, and handle any mentions of the dnceng team). Things are now working as expected. Given that the alert cleared I’m starting to think this was a problem with the data in the GitHub payload. We failed to process a lot of PRs while this was happening, and things cleared up by themselves. Going to close this and keep an eye for another alert to see if it makes sense to harden against this somehow.
@dotnet/dnceng This is a test, please disregard.