question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Datasets] [Bug] workers and actors leaking in trivial dataset use

See original GitHub issue

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Clusters

What happened + What you expected to happen

I have a stock cluster and I’m testing datasets, but after running the code below, an actor named ray::_DesignatedBlockOwner persists in addition to three ray::IDLE workers. Eventually, these will cumulatively consume all the memory in the cluster.

I will also eventually receive warnings such as this:

The actor '_DesignatedBlockOwner' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
The remote function 'ray.data.read_api.remote_read' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
The remote function '__main__.foo' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.

Versions / Dependencies

ray-1.9.2 python-3.7.7 (and 3.9.5) pyarrow-6.0.1 ubuntu-21.04 (default ray containers) ami-029536273cb04d4d9; so-called deep learning ami (ubuntu) version 55

Reproduction script

import ray

@ray.remote
def foo():
    return ray.data.range_arrow(10000, parallelism = 1)

ray.init(address="auto")
ref = foo.remote()
print(ray.get(ref))
ray.shutdown()

Anything else

The code leaks three workers and one actor every time it is invoked against the cluster.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ericlcommented, Feb 6, 2022

I can confirm StatsActor is cleaned up after the job exited, so things are working as intended now.

1reaction
ericlcommented, Feb 2, 2022

Hmm if the number of StatsActors increases @clarkzinzow I think we should make it a named actor / singleton as Serve does for its coordinator actor.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Datasets GA Milestone · GitHub
[Bug] Cannot access member "map_batches" for type "DatasetPipeline[int]" bug ... [Datasets] [Bug] workers and actors leaking in trivial dataset use bug P1 ......
Read more >
Privacy and Synthetic Datasets - OSF
To analyze synthetic data's legality, we first briefly discuss the database-privacy problem and outline a few privacy metrics that have populated the field...
Read more >
quac · Datasets at Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >
LAION-5B: An open large-scale dataset for training next ...
We present LAION-5B, an open, publically available dataset of 5.8B image-text pairs and validate it by reproducing results of training state-of-the-art CLIP ...
Read more >
3.2.1 PDF - h5py Documentation
using h5py is: Groups work like dictionaries, and datasets work like NumPy arrays. Suppose someone has sent you a HDF5 file, mytestfile.hdf5 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found