Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[autoscaler] Remote execution gets slower, much space used on head node

See original GitHub issue

Ray 1.0.1 Python 3.7.6 on conda Ubuntu 18.04

After a few hours running remote functions, remote functions take much time than the beginning. If I ray stop on head node and ray up xx.yaml on driver node, this issue disappears. And it happens again after a few hours. Also /tmp/ray takes much storage if head node is running quite long time. Is there any command to clean unnecessary files? I just use rm -rf /tmp/ray at the moment.

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

rkooo567commented, Dec 3, 2020

Do you think it is possible to create a reproducible script?

1reaction

rkooo567commented, Dec 3, 2020

About the logs size, you will be able to configure log rotation very soon (within a couple weeks).

Read more comments on GitHub >

Top Results From Across the Web

Autoscaling clusters with Ray - Anyscale

The head node is special because it will be managing the cluster through the Ray Autoscaler: it will be responsible for syncing files...

Understanding Kubernetes Autoscaling - Scaleway's Blog

Horizontal Scaling means modifying the compute resources of an existing cluster, for example, by adding new nodes to it or by adding new...

Cluster Autoscaler: How It Works and Solving Common ...

Pending Nodes Exist But Cluster Does Not Scale Up ; All suitable node groups are at maximum size. Increase the maximum size of...

Fix common cluster issues | Elasticsearch Guide [8.5] | Elastic

This error indicates a data node is critically low on disk space and has reached the flood-stage disk usage watermark. Circuit breaker errors:...

Autoscaling - Amazon EKS - AWS Documentation

Insufficient Capacity Errors occur whenever your Amazon EC2 Auto Scaling group can't scale up due to a lack of available capacity. Selecting many...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

[core/win] ray.init is hanging forever on windows (vscode)

[tune] get_checkpoint_paths fails due to glob command for .tune_metadata file