[tune] Custom contents from worker results directory are not fully synced to cloud
See original GitHub issueWhat is the problem?
When saving files to the results directory on each trainable step on a worker, only the files from first worker iterations get stored to cloud. When I inspect the trial directories in the worker machines, they are empty and I can’t see even the first files that have been synced to cloud.
All the files from the head node get synced as expected.
Ray version and other system information (Python version, TensorFlow version, OS):
$ python --version
Python 3.7.3
$ python -c "import ray; print(ray.__version__)"
0.8.0
Does the problem occur on the latest wheels? Not sure, only tried on the latest 0.8.0.
Reproduction
Run experiment with tune, such that each trainable step writes a file (e.g. an image) to a subdirectory inside the trial directory (e.g. plt.savefig(os.path.join(os.getcwd(), 'subdir', timestep_image_name)
)). Run this on cluster with at least 1 worker node. The files from the worker don’t get correctly synced.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
I tested and couldn’t repro fairly recently, I’d recommend just closing this for now and re-opening if anyone sees any issues.
Hmm, not 100% sure actually. I haven’t seen the issue in a while, but it might be because some of my runs are still on the older version of ray.