question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[tune] Custom contents from worker results directory are not fully synced to cloud

See original GitHub issue

What is the problem?

When saving files to the results directory on each trainable step on a worker, only the files from first worker iterations get stored to cloud. When I inspect the trial directories in the worker machines, they are empty and I can’t see even the first files that have been synced to cloud.

All the files from the head node get synced as expected.

Ray version and other system information (Python version, TensorFlow version, OS):

$ python --version
Python 3.7.3
$ python -c "import ray; print(ray.__version__)"
0.8.0

Does the problem occur on the latest wheels? Not sure, only tried on the latest 0.8.0.

Reproduction

Run experiment with tune, such that each trainable step writes a file (e.g. an image) to a subdirectory inside the trial directory (e.g. plt.savefig(os.path.join(os.getcwd(), 'subdir', timestep_image_name))). Run this on cluster with at least 1 worker node. The files from the worker don’t get correctly synced.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
ujvlcommented, Feb 7, 2020

I tested and couldn’t repro fairly recently, I’d recommend just closing this for now and re-opening if anyone sees any issues.

0reactions
hartikainencommented, Feb 7, 2020

Hmm, not 100% sure actually. I haven’t seen the issue in a while, but it might be because some of my runs are still on the older version of ray.

Read more comments on GitHub >

github_iconTop Results From Across the Web

One or more objects don't sync when using the Azure Active ...
Describes an issue in which one or more AD DS object attributes don't sync to Azure AD through the Azure Active Directory Sync...
Read more >
Google Cloud Directory Sync examples
Contributed by Google employees. This document describes four scenarios in which you can use Google Cloud Directory Sync (GCDS) to synchronize ...
Read more >
Tuning Input/Output (I/O) Operations for PostgreSQL
PostgreSQL I/O is quite reliable, stable and performant on pretty much any hardware, including cloud. In this blog, we detail steps you can ......
Read more >
A Guide To Using Checkpoints — Ray 2.2.0
To make sure this works, Ray Tune comes with facilities to synchronize trial checkpoints between nodes. Generally we consider three cases: When using...
Read more >
Fix Creative Cloud files sync issue on macOS - Adobe Support
Any changes (such as adding, deleting, or editing files/folders) you made to your files/folders in the Creative Cloud Files folder are not ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found