Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] Improved autoscaler log messages

See original GitHub issue

The current autoscaler output is quite difficult to interpret due to its verbosity and low-level details. This is a proposal to clean it by periodically emitting the following summary table:

======== Autoscaler status 2020-11-20 23:14:36,653 ========
Node status
------------------------------------------------------------
Healthy:
 2 p3.2xlarge (2 active)
 20 m4.4xlarge (18 active, 2 idle)

Pending:
 34.5.234.51: m4.4xlarge, launching
 34.5.234.52: m4.4xlarge, launching
 34.5.234.53: m4.4xlarge, waiting for ssh
 34.5.234.54: m4.4xlarge, waiting for ssh
 34.5.234.55: m4.4xlarge, starting ray, /tmp/ray/setup-10.log
 34.5.234.56: m4.4xlarge, setting up, /tmp/ray/setup-11.log
 34.5.234.57: m4.4xlarge, setting up, /tmp/ray/setup-12.log

Recent failures:
 172.24.25.33: m4.4xlarge, /tmp/ray/setup-8.log
 35.4.235.11: p3.2xlarge, /tmp/ray/setup-9.log

Resources
------------------------------------------------------------
Usage:
 530.0/544.0 CPU
 2.0/2.0 GPU
 0.0/2.0 AcceleratorType:V100
 0.0 GiB/1583.19 GiB memory
 0.0 GiB/471.02 GiB object_store_memory

Demands:
 {"CPU": 1}: 150 pending tasks
 [{"CPU": 4} * 5]: 5 pending placement groups
 [{"CPU": 1} * 100]: from request_resources()

Implementation details:

The autoscaler should periodically generate a JSON status message that includes the above information.
We should log the above text summary for of the JSON status every 10-30s.
Other ray components such as the dashboard and ray status can also access this information.

Issue Analytics

State:
Created 3 years ago
Reactions:5
Comments:25 (25 by maintainers)

Top GitHub Comments

6reactions

rkooo567commented, Nov 21, 2020

cc @mfitton We should definitely port this to our dashboard.

2reactions

markgoodheadcommented, Nov 21, 2020

This would be a game-changing feature for autoscaler debugging/visibility - can’t wait until this is on the dashboard!

Top Results From Across the Web

Troubleshooting errors - AMS Advanced User Guide

Many AMS provisioning RFC failures can be investigated through the CloudFormation ... Deletion of log management AWS Lambda functions, or log streams.

Customizing Platform Log Forwarding | VMware Tanzu Docs

You can configure VMware Tanzu Application Service for VMs (TAS for VMs) to forward logs to remote endpoints using the syslog protocol defined...

View autoscaler logs - Compute Engine - Google Cloud

Using the Logs Explorer, you can see events related to: Autoscaler recommendation for resizing a managed instance group (MIG).

Forwarding logs to external third-party logging systems

To send logs to other log aggregators, you use the OpenShift Container Platform Cluster Log Forwarder. This API enables you to send container,...

Configuring Citrix ADC appliance for audit logging

User Configurable Log Messages (userDefinedAuditlog) option is enabled for when configuring the audit action server to which you want to send the logs...