[Datasets] [Bug] workers and actors leaking in trivial dataset use
See original GitHub issueSearch before asking
- I searched the issues and found no similar issues.
Ray Component
Ray Clusters
What happened + What you expected to happen
I have a stock cluster and I’m testing datasets, but after running the code below, an actor named ray::_DesignatedBlockOwner
persists in addition to three ray::IDLE
workers. Eventually, these will cumulatively consume all the memory in the cluster.
I will also eventually receive warnings such as this:
The actor '_DesignatedBlockOwner' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
The remote function 'ray.data.read_api.remote_read' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
The remote function '__main__.foo' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
Versions / Dependencies
ray-1.9.2
python-3.7.7 (and 3.9.5)
pyarrow-6.0.1
ubuntu-21.04 (default ray containers)
ami-029536273cb04d4d9
; so-called deep learning ami (ubuntu) version 55
Reproduction script
import ray
@ray.remote
def foo():
return ray.data.range_arrow(10000, parallelism = 1)
ray.init(address="auto")
ref = foo.remote()
print(ray.get(ref))
ray.shutdown()
Anything else
The code leaks three workers and one actor every time it is invoked against the cluster.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Datasets GA Milestone · GitHub
[Bug] Cannot access member "map_batches" for type "DatasetPipeline[int]" bug ... [Datasets] [Bug] workers and actors leaking in trivial dataset use bug P1 ......
Read more >Privacy and Synthetic Datasets - OSF
To analyze synthetic data's legality, we first briefly discuss the database-privacy problem and outline a few privacy metrics that have populated the field...
Read more >quac · Datasets at Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >LAION-5B: An open large-scale dataset for training next ...
We present LAION-5B, an open, publically available dataset of 5.8B image-text pairs and validate it by reproducing results of training state-of-the-art CLIP ...
Read more >3.2.1 PDF - h5py Documentation
using h5py is: Groups work like dictionaries, and datasets work like NumPy arrays. Suppose someone has sent you a HDF5 file, mytestfile.hdf5 ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can confirm StatsActor is cleaned up after the job exited, so things are working as intended now.
Hmm if the number of StatsActors increases @clarkzinzow I think we should make it a named actor / singleton as Serve does for its coordinator actor.