question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[autoscaler] Remote execution gets slower, much space used on head node

See original GitHub issue

Ray 1.0.1 Python 3.7.6 on conda Ubuntu 18.04

After a few hours running remote functions, remote functions take much time than the beginning. If I ray stop on head node and ray up xx.yaml on driver node, this issue disappears. And it happens again after a few hours. Also /tmp/ray takes much storage if head node is running quite long time. Is there any command to clean unnecessary files? I just use rm -rf /tmp/ray at the moment.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rkooo567commented, Dec 3, 2020

Do you think it is possible to create a reproducible script?

1reaction
rkooo567commented, Dec 3, 2020

About the logs size, you will be able to configure log rotation very soon (within a couple weeks).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Autoscaling clusters with Ray - Anyscale
The head node is special because it will be managing the cluster through the Ray Autoscaler: it will be responsible for syncing files...
Read more >
Understanding Kubernetes Autoscaling - Scaleway's Blog
Horizontal Scaling means modifying the compute resources of an existing cluster, for example, by adding new nodes to it or by adding new...
Read more >
Cluster Autoscaler: How It Works and Solving Common ...
Pending Nodes Exist But Cluster Does Not Scale Up ; All suitable node groups are at maximum size. Increase the maximum size of...
Read more >
Fix common cluster issues | Elasticsearch Guide [8.5] | Elastic
This error indicates a data node is critically low on disk space and has reached the flood-stage disk usage watermark. Circuit breaker errors:...
Read more >
Autoscaling - Amazon EKS - AWS Documentation
Insufficient Capacity Errors occur whenever your Amazon EC2 Auto Scaling group can't scale up due to a lack of available capacity. Selecting many...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found