[RFC] Improved autoscaler log messages
See original GitHub issueThe current autoscaler output is quite difficult to interpret due to its verbosity and low-level details. This is a proposal to clean it by periodically emitting the following summary table:
======== Autoscaler status 2020-11-20 23:14:36,653 ========
Node status
------------------------------------------------------------
Healthy:
2 p3.2xlarge (2 active)
20 m4.4xlarge (18 active, 2 idle)
Pending:
34.5.234.51: m4.4xlarge, launching
34.5.234.52: m4.4xlarge, launching
34.5.234.53: m4.4xlarge, waiting for ssh
34.5.234.54: m4.4xlarge, waiting for ssh
34.5.234.55: m4.4xlarge, starting ray, /tmp/ray/setup-10.log
34.5.234.56: m4.4xlarge, setting up, /tmp/ray/setup-11.log
34.5.234.57: m4.4xlarge, setting up, /tmp/ray/setup-12.log
Recent failures:
172.24.25.33: m4.4xlarge, /tmp/ray/setup-8.log
35.4.235.11: p3.2xlarge, /tmp/ray/setup-9.log
Resources
------------------------------------------------------------
Usage:
530.0/544.0 CPU
2.0/2.0 GPU
0.0/2.0 AcceleratorType:V100
0.0 GiB/1583.19 GiB memory
0.0 GiB/471.02 GiB object_store_memory
Demands:
{"CPU": 1}: 150 pending tasks
[{"CPU": 4} * 5]: 5 pending placement groups
[{"CPU": 1} * 100]: from request_resources()
Implementation details:
- The autoscaler should periodically generate a JSON status message that includes the above information.
- We should log the above text summary for of the JSON status every 10-30s.
- Other ray components such as the dashboard and
ray statuscan also access this information.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:25 (25 by maintainers)
Top Results From Across the Web
Troubleshooting errors - AMS Advanced User Guide
Many AMS provisioning RFC failures can be investigated through the CloudFormation ... Deletion of log management AWS Lambda functions, or log streams.
Read more >Customizing Platform Log Forwarding | VMware Tanzu Docs
You can configure VMware Tanzu Application Service for VMs (TAS for VMs) to forward logs to remote endpoints using the syslog protocol defined...
Read more >View autoscaler logs - Compute Engine - Google Cloud
Using the Logs Explorer, you can see events related to: Autoscaler recommendation for resizing a managed instance group (MIG).
Read more >Forwarding logs to external third-party logging systems
To send logs to other log aggregators, you use the OpenShift Container Platform Cluster Log Forwarder. This API enables you to send container,...
Read more >Configuring Citrix ADC appliance for audit logging
User Configurable Log Messages (userDefinedAuditlog) option is enabled for when configuring the audit action server to which you want to send the logs...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

cc @mfitton We should definitely port this to our dashboard.
This would be a game-changing feature for autoscaler debugging/visibility - can’t wait until this is on the dashboard!