Model loaded from checkpoint has bad accuracy
See original GitHub issueWhat is your question?
I have a model that I train with EarlyStopping and ModelCheckpoint on a custom metric (MAP). The training works fine, after 2 epochs the model reaches 96% MAP however when I load it and test it with the exact same function the MAP is 16% (same as untrained model). I must be doing something wrong, but what ?
Code
def default_model(dataset: str):
if torch.cuda.is_available():
print("Using the GPU")
device = torch.device("cuda")
else:
print("Using the CPU")
device = torch.device("cpu")
kwargs = {
"dataset": dataset, "embed_size": 50, "depth": 3,
"vmap": Graph3D.from_dataset(dataset).vocabulary,
"neg_per_pos": 5, "max_paths": 255, "device": device
}
try:
model = TAPKG.load_from_checkpoint("Checkpoints/epoch=2-step=612260.ckpt", **kwargs).to(device)
return model
except OSError as e:
print(f"Couldn't load the save for the model, training instead. ({e.__class__.__name__})")
model = TAPKG(**kwargs).to(device)
cpt = pl.callbacks.ModelCheckpoint(monitor="MAP", mode="max", dirpath="Checkpoints", save_top_k=1)
trainer = pl.Trainer(
gpus=1,
check_val_every_n_epoch=1,
callbacks=[
cpt,
pl.callbacks.EarlyStopping(monitor="MAP", mode="max", min_delta=.002, patience=2)
],
auto_lr_find=True
)
# noinspection PyTypeChecker
trainer.fit(model)
print(cpt.best_model_path, cpt.best_model_score)
return model
def eval_link_completion(dataset):
model = default_model(dataset)
ranks = model.link_completion_rank()
MAP(ranks, plot=True)
Right after the training eval_link_completion
shows a MAP of 96%, when I load the model however it’s back to 16%.
- OS: KUbuntu 20.04
- Packaging pip
- Version 1.2.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Top Results From Across the Web
Model loaded from checkpoint has bad accuracy #6159 - GitHub
I have a model that I train with EarlyStopping and ModelCheckpoint on a custom metric (MAP). The training works fine, after 2 epochs...
Read more >Keras: A loaded checkpoint model to resume a training could ...
At epoch 15 , you have an accuracy of 88% (say you save your model according to the best validation accuracy).
Read more >How to Checkpoint Deep Learning Models in Keras
A simpler checkpoint strategy is to save the model weights to the same file if and only if the validation accuracy improves. This...
Read more >Training from Checkpoint - Performance is surprisingly bad
Using the callback ModelCheckpoint, I save the weights each epoch. Training finishes and the loss and accuracy on training set is about 0.15...
Read more >Save and load models | TensorFlow Core
An untrained model will perform at chance levels (~10% accuracy): ... To test, reset the model, and load the latest checkpoint:.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yep I’m sorry, my loading/saving code was good, I just had another issue somewhere, thanks for your time
I’m afraid i can’t help you, it’s been more than a year and I’d be completely unable to remember what the problem was