question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Checkpoint usability issues

See original GitHub issue

There are a number of usability issues that have surfaced while documenting checkpoints.

  • CheckpointLoader, CheckpointSaver and RunDirectoryUploader all have pretty out of date docstrings (Pain level: moderate)
  • save_folder that is a relative path should be relative to CWD, not the run directory. (Pain level: high)
  • Saving to object store requires the RunDirectoryUploader callback to be provided as well. This is not obvious and probably shouldn’t be required (Pain level: mild)
  • Checkpoints should be saved as pytorch checkpoint files, not tarballs (Pain level: high)

Additional notes: Save folder Since the user is specifying the save_folder, it really should be respected. If they used a relative path, assume they meant relative to the CWD, not relative to some run directory. If we are concerned about checkpoint files overwriting others, we can either error or use the timestamp to ensure the filename is unique.

Saving to object stores This is not particularly high priority, but is a usability issue. To use the object store, we should do as much as possible to just specify the URI as the save_folder and not require a separate callback.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
hanlintcommented, Mar 29, 2022

I’ll add another one in (cc: @ajaysaini725 )

  • For testing, we want to programmatically retrieve the paths of the saved checkpoints after training. e.g.:
trainer.fit()

saved_checkpoints = trainer.checkpoint_saver.saved_files  # absolute path
0reactions
ravi-mosaicmlcommented, Feb 24, 2022

#421 and the new checkpoint API (https://www.notion.so/Checkpoint-API-Redesign-f0ee88d4bd5149b6a091589ed3f465f5) will fix the other issues identified in this issue. However, these changes are scheduled for v0.5, not v0.4.1

Read more comments on GitHub >

github_iconTop Results From Across the Web

Another SmartConsole Usability Issue
Today I decided to re-work a bit the IPS protections of our gateways. What seemed like a 10 minutes job turned to be...
Read more >
[AIR] various AIR usability issues around Checkpoints and ... - GitHub
I've created a gist that shows how to train a PyTorch model with AIR, then stores it to a checkpoint, and loads it...
Read more >
7 Basic Usability Checkpoints for Modern Websites
7 Basic Usability Checkpoints for Modern Websites · 1. Don't make font sizes too small. · 2. Write clear link text. · 3....
Read more >
Project checkpoint 5 - SWE 632 - User Interface Design ...
In this HW assignment, you will improve the interaction design of your web app by making changes to fix at least 3 usability...
Read more >
10 Checkpoints For Initial User Interface Testing Of ...
User Interface testing has become an individual core testing process. This article discusses 10 checkpoints that one must keep in mind for basic...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found