auto enable TPU checkpoint save and checkpoint load using the proper wrappers
See original GitHub issueI trained a model on TPU with google colab. When trying to load the checkpoint on GPU it gives following error
RuntimeError: Could not run 'aten::empty.memory_format' with arguments from the 'XLATensorId' backend. 'aten::empty.memory_format' is only available for these backends: [CUDATensorId, SparseCPUTensorId, VariableTensorId, CPUTensorId, MkldnnCPUTensorId, SparseCUDATensorId].
How to load the checkpoint saved on TPU with cpu/gpu ?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
auto enable TPU checkpoint save and checkpoint load using ...
The recommended way to save checkpoint is to use the xm.save() API. This is a thin wrapper around torch.save() that makes sure there...
Read more >Training checkpoints | TensorFlow Core
Model automatically track variables assigned to their attributes. ... You can easily save a model-checkpoint with Model.save_weights .
Read more >Handling big models - Hugging Face
How the Process Works: Working with Code · Instantiating an empty model · Sharded checkpoints · Loading weights · Run the model ·...
Read more >Train 1 trillion+ parameter models - PyTorch Lightning
Enabling Module Sharding for Maximum Memory Efficiency. Auto Wrapping. Model layers should be wrapped in FSDP in a nested way to save peak...
Read more >Is there a decent workaround to saving checkpoints in local ...
Where the official way of saving a checkpoint when using a Tensorflow TPU is to use the Google Cloud Service. I am working...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

will be fixed in #2726
This issue has been automatically marked as stale because it hasn’t had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!