question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model loaded from checkpoint has bad accuracy

See original GitHub issue

What is your question?

I have a model that I train with EarlyStopping and ModelCheckpoint on a custom metric (MAP). The training works fine, after 2 epochs the model reaches 96% MAP however when I load it and test it with the exact same function the MAP is 16% (same as untrained model). I must be doing something wrong, but what ?

Code

def default_model(dataset: str):
	if torch.cuda.is_available():
		print("Using the GPU")
		device = torch.device("cuda")
	else:
		print("Using the CPU")
		device = torch.device("cpu")
	kwargs = {
		"dataset": dataset, "embed_size": 50, "depth": 3,
		"vmap": Graph3D.from_dataset(dataset).vocabulary,
		"neg_per_pos": 5, "max_paths": 255, "device": device
	}
	try:
		model = TAPKG.load_from_checkpoint("Checkpoints/epoch=2-step=612260.ckpt", **kwargs).to(device)
		return model
	except OSError as e:
		print(f"Couldn't load the save for the model, training instead. ({e.__class__.__name__})")
		model = TAPKG(**kwargs).to(device)
	cpt = pl.callbacks.ModelCheckpoint(monitor="MAP", mode="max", dirpath="Checkpoints", save_top_k=1)
	trainer = pl.Trainer(
		gpus=1,
		check_val_every_n_epoch=1,
		callbacks=[
			cpt,
			pl.callbacks.EarlyStopping(monitor="MAP", mode="max", min_delta=.002, patience=2)
		],
		auto_lr_find=True
	)
	# noinspection PyTypeChecker
	trainer.fit(model)
	print(cpt.best_model_path, cpt.best_model_score)
	return model

def eval_link_completion(dataset):
	model = default_model(dataset)
	ranks = model.link_completion_rank()
	MAP(ranks, plot=True)

Right after the training eval_link_completion shows a MAP of 96%, when I load the model however it’s back to 16%.

  • OS: KUbuntu 20.04
  • Packaging pip
  • Version 1.2.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
Inspirateurcommented, Feb 24, 2021

Yep I’m sorry, my loading/saving code was good, I just had another issue somewhere, thanks for your time

0reactions
Inspirateurcommented, Nov 13, 2022

I’m afraid i can’t help you, it’s been more than a year and I’d be completely unable to remember what the problem was

Read more comments on GitHub >

github_iconTop Results From Across the Web

Model loaded from checkpoint has bad accuracy #6159 - GitHub
I have a model that I train with EarlyStopping and ModelCheckpoint on a custom metric (MAP). The training works fine, after 2 epochs...
Read more >
Keras: A loaded checkpoint model to resume a training could ...
At epoch 15 , you have an accuracy of 88% (say you save your model according to the best validation accuracy).
Read more >
How to Checkpoint Deep Learning Models in Keras
A simpler checkpoint strategy is to save the model weights to the same file if and only if the validation accuracy improves. This...
Read more >
Training from Checkpoint - Performance is surprisingly bad
Using the callback ModelCheckpoint, I save the weights each epoch. Training finishes and the loss and accuracy on training set is about 0.15...
Read more >
Save and load models | TensorFlow Core
An untrained model will perform at chance levels (~10% accuracy): ... To test, reset the model, and load the latest checkpoint:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found