TFTrainer: Checkpoints not getting saved in `output_dir` but in {cwd}/checkpoint
See original GitHub issueI am using TFTrainer for the SQuAD task. Checkpoints are being created in cwd/checkpoint insted of output_dir.
Potential Cause: https://github.com/huggingface/transformers/blob/9ca485734aea269961d63a040ff194365d151fd1/src/transformers/trainer_tf.py#L156 Instead of PREFIX_CHECKPOINT_DIR we need to have
os.path.join(self.args.output_dir, PREFIX_CHECKPOINT_DIR)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:6 (4 by maintainers)
Top Results From Across the Web
No skipping steps after loading from checkpoint - Transformers
Hey! I am trying to continue training by loading a checkpoint. But for some reason, it always starts from scratch.
Read more >What does 'output_dir' mean in transformers ... - Stack Overflow
Model checkpoints: trainable parameters of the model saved during training. Further it can save the values of metrics used during training and ...
Read more >ML Design Pattern #2: Checkpoints | by Lak Lakshmanan
The key steps of a machine learning pipeline are to train the model (using model.fit() in Keras), evaluate the model (using model.evaluate() ), ......
Read more >Checkpointing (basic) - PyTorch Lightning - Read the Docs
Save a checkpoint. Lightning automatically saves a checkpoint for you in your current working directory, with the state of your last training epoch....
Read more >Save and reuse Checkpoints in Ray 2.0 version
Save the checkpoint with model weights in a pickle file ... passing that within my .fit() function but have not been able to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It is just matter of time 😃
Ohh, I see! Thanks for the clarification. Just a quick question before i close the issue, Is there any specific reason for this? Or it’s just a matter of time before it starts to behave similar to pytorch trainer.