question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Older implementation of learning without forgetting performed better

See original GitHub issue

🐛 Describe the bug The old implementation of LwF performed much better than the current one. Is this working as intended or has a bug been introduced? As a further note what sort of testing would prevent this sort of regression?

🐜 To Reproduce

from torch.optim import SGD
from torch.nn import CrossEntropyLoss
from avalanche.benchmarks.classic import SplitMNIST
from avalanche.evaluation.metrics import accuracy_metrics
from avalanche.models import SimpleMLP
from avalanche.logging import InteractiveLogger
from avalanche.training.plugins import EvaluationPlugin
from avalanche.training.strategies import LwF

scenario = SplitMNIST(n_experiences=5)
model = SimpleMLP(num_classes=scenario.n_classes)

eval_plugin = EvaluationPlugin(
    accuracy_metrics(minibatch=True, epoch=True, experience=True, stream=True),
    loggers=[InteractiveLogger()]
)

cl_strategy = LwF(model, SGD(model.parameters(), lr=0.001, momentum=0.9),
    CrossEntropyLoss(), train_mb_size=500, train_epochs=1, eval_mb_size=100, alpha=10, temperature=2.0,
    evaluator=eval_plugin)

print('Starting experiment...')
for i, experience in enumerate(scenario.train_stream):
    print("Start of experience: ", experience.current_experience)
    print("Current Classes: ", experience.classes_in_this_experience)

    res = cl_strategy.train(experience)
    print('Training completed')
    print('Computing accuracy on the whole test set')
    cl_strategy.eval(scenario.test_stream[:i+1])

New implementation

100%|██████████| 19/19 [00:00<00:00, 52.78it/s]
> Eval on experience 4 (Task 0) from test stream ended.
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp004 = 0.0021
-- >> End of eval phase << --
	Top1_Acc_Stream/eval_phase/test_stream/Task000 = 0.2298

Old implementation as of 5356591e2355fbf2aa3d5c0dd5f7bc8613991cff

100%|██████████| 21/21 [00:00<00:00, 47.12it/s]
> Eval on experience 4 (Task 0) from test stream ended.
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp004 = 0.9178
-- >> End of eval phase << --
	Top1_Acc_Stream/eval_phase/test_stream = 0.4413

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
tachyonicClockcommented, Nov 11, 2021

I see. Thanks for the clarification, I greatly appreciate it. I’m glad it’s working as intended and we have the reproducible continual learning repository as a way to detect these sort of issues.

0reactions
tachyonicClockcommented, Nov 16, 2021

@AntonioCarta The point of soft-max is that it illuminates the slight differences in the output layer. Essentially the teacher network is saying “that potato looks like a potato but it also looks just a bit like a dog”. The issue seems to be the fact that “dark knowledge” is lost when only a few activation units are used. A problem that is very significant in some of my research, because I am using a pre-trained resnet.

The issue that training (the traditional part) drives all the probabilities down is notable. Its true that before training on an experiences that experiences’ activation are meaningless at least for a randomly initialised model. However I think it’s clear empirically that the additional knowledge (or sort of regularisation) is significant (especially if a pretrained model is used). I concede that my original idea to add useless outputs is a little silly, but a revised version would be to use a different non-final layer close to the end to perform distillation with. I’m not suggesting we do that for LwF but it might be advantageous.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Learning without Forgetting - arXiv
We experiment on adjusting the balance between old-new task losses, pro- viding a more thorough and intuitive comparison of related methods (Figure 7)....
Read more >
Learning without Forgetting - IEEE Xplore
We experiment on adjusting the balance between old-new task losses, pro- viding a more thorough and intuitive comparison of related methods (Figure 7)....
Read more >
Learning Without Forgetting Simplified - Towards Data Science
The purpose of Learning without Forgetting (LwF) is to learn a network that can perform well on both old tasks and new tasks...
Read more >
Implementation of Learning without Forgetting - Artificial Intelligence ...
Does anybody have a sample implementation of Learning without Forgetting Continual learning method trained in different dataset? For example for first task, ...
Read more >
LEARNING TO LEARN WITHOUT FORGETTING BY ...
This method learns parameters that make inter- ference based on future gradients less likely and transfer based on future gradients more likely.1 We...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found