question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[tune] get_checkpoint_paths fails due to glob command for .tune_metadata file

See original GitHub issue

TrainableUtil.get_checkpoints_paths does not recognize the .tune_metadata files due to the fact that glob.glob does not match files starting with ., see the end of the glob documentation.

https://github.com/ray-project/ray/blob/e0573df337d9d6ec4298e4d3dce056e071abe9e4/python/ray/tune/utils/trainable.py#L151-L156

Possible solution: It seems that only .tune_metadata files are used throughout the library, so one could just omit glob.glob, e.g.

metadata_file = os.path.join(chkpt_dir, ".tune_metadata")
if not os.path.isfile(metadata_file):
     raise ValueError( 
         "{} has no tune_metadata.".format(chkpt_dir))

Ray version: 1.0.1.post1 Python version: 3.7.3 (issue should be independent of the python version)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
vuoristocommented, Apr 16, 2021

Hi @krfricke, it leads to problems when using the experiment analysis tools with a trial directory instead of a trial instance.

I have created a colab notebook to demonstrate the problem: https://colab.research.google.com/drive/12yc4xH4yYKlwbnVZEwkX44n7EUnj7PWp?usp=sharing

For anyone who runs into this issue: I don’t think this is explicitly documented anywhere but you can get around it by making sure you return the full path to your checkpoint, not just the checkpoint dir in save_checkpoint. E.g.

def save_checkpoint(self, tmp_checkpoint_dir):
    return os.path.join(tmp_checkpoint_dir, 'checkpoint')

Not

def save_checkpoint(self, tmp_checkpoint_dir):
    return tmp_checkpoint_dir
0reactions
karstenddwxcommented, Jan 6, 2021

Hi, I’ve run into similar issues with trial_path having special characters. No checkpoint paths returned as glob doesn’t match special characters.

If hyper parameters are set by tune/grid_search, the parameters get part of the trial_path. If in that case the hyperparameter is a list, … e.g. “model”: { “fcnet_hiddens”: grid_search([[256, 256]]), }

…the resulting trial_path looks like that and is not working with TrainableUtil.get_checkpoints_paths(trial_path). DQN_simple_env_0_fcnet_hiddens=[256, 256]_2021-01-06_11-52-03

glob doesn’t match ‘[’ and ‘]’.

Read more comments on GitHub >

github_iconTop Results From Across the Web

glob manual page - Tcl Built-In Commands
This command performs file name “globbing” in a fashion similar to the csh shell or bash shell. It returns a list of the...
Read more >
Glob fails to link if there's only one file · Issue #282
When I try to use glob with only one file in the source directory, dotbot fails to create the link with "Nonexistent source...
Read more >
Perl Globbing a Variable stops on first match
I supply "song?.txt" as an argument to my program. When I do: foreach $file (glob "$ARGV[0]") ...
Read more >
glob Subroutine
The glob subroutine constructs a list of accessible files that match the Pattern parameter. The glob subroutine matches all accessible path names against ......
Read more >
glob - the Tcler's Wiki!
glob , a built-in Tcl command, matches files by name. ... This can lead to cases where other file command fail when handed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found