Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Older implementation of learning without forgetting performed better

See original GitHub issue

🐛 Describe the bug The old implementation of LwF performed much better than the current one. Is this working as intended or has a bug been introduced? As a further note what sort of testing would prevent this sort of regression?

🐜 To Reproduce

from torch.optim import SGD
from torch.nn import CrossEntropyLoss
from avalanche.benchmarks.classic import SplitMNIST
from avalanche.evaluation.metrics import accuracy_metrics
from avalanche.models import SimpleMLP
from avalanche.logging import InteractiveLogger
from avalanche.training.plugins import EvaluationPlugin
from avalanche.training.strategies import LwF

scenario = SplitMNIST(n_experiences=5)
model = SimpleMLP(num_classes=scenario.n_classes)

eval_plugin = EvaluationPlugin(
    accuracy_metrics(minibatch=True, epoch=True, experience=True, stream=True),
    loggers=[InteractiveLogger()]
)

cl_strategy = LwF(model, SGD(model.parameters(), lr=0.001, momentum=0.9),
    CrossEntropyLoss(), train_mb_size=500, train_epochs=1, eval_mb_size=100, alpha=10, temperature=2.0,
    evaluator=eval_plugin)

print('Starting experiment...')
for i, experience in enumerate(scenario.train_stream):
    print("Start of experience: ", experience.current_experience)
    print("Current Classes: ", experience.classes_in_this_experience)

    res = cl_strategy.train(experience)
    print('Training completed')
    print('Computing accuracy on the whole test set')
    cl_strategy.eval(scenario.test_stream[:i+1])

New implementation

100%|██████████| 19/19 [00:00<00:00, 52.78it/s]
> Eval on experience 4 (Task 0) from test stream ended.
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp004 = 0.0021
-- >> End of eval phase << --
	Top1_Acc_Stream/eval_phase/test_stream/Task000 = 0.2298

Old implementation as of 5356591e2355fbf2aa3d5c0dd5f7bc8613991cff

100%|██████████| 21/21 [00:00<00:00, 47.12it/s]
> Eval on experience 4 (Task 0) from test stream ended.
	Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp004 = 0.9178
-- >> End of eval phase << --
	Top1_Acc_Stream/eval_phase/test_stream = 0.4413

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

tachyonicClockcommented, Nov 11, 2021

I see. Thanks for the clarification, I greatly appreciate it. I’m glad it’s working as intended and we have the reproducible continual learning repository as a way to detect these sort of issues.

0reactions

tachyonicClockcommented, Nov 16, 2021

@AntonioCarta The point of soft-max is that it illuminates the slight differences in the output layer. Essentially the teacher network is saying “that potato looks like a potato but it also looks just a bit like a dog”. The issue seems to be the fact that “dark knowledge” is lost when only a few activation units are used. A problem that is very significant in some of my research, because I am using a pre-trained resnet.

The issue that training (the traditional part) drives all the probabilities down is notable. Its true that before training on an experiences that experiences’ activation are meaningless at least for a randomly initialised model. However I think it’s clear empirically that the additional knowledge (or sort of regularisation) is significant (especially if a pretrained model is used). I concede that my original idea to add useless outputs is a little silly, but a revised version would be to use a different non-final layer close to the end to perform distillation with. I’m not suggesting we do that for LwF but it might be advantageous.

Top Results From Across the Web

Learning without Forgetting - arXiv

We experiment on adjusting the balance between old-new task losses, pro- viding a more thorough and intuitive comparison of related methods (Figure 7)....

Learning without Forgetting - IEEE Xplore

We experiment on adjusting the balance between old-new task losses, pro- viding a more thorough and intuitive comparison of related methods (Figure 7)....

Learning Without Forgetting Simplified - Towards Data Science

The purpose of Learning without Forgetting (LwF) is to learn a network that can perform well on both old tasks and new tasks...

Implementation of Learning without Forgetting - Artificial Intelligence ...

Does anybody have a sample implementation of Learning without Forgetting Continual learning method trained in different dataset? For example for first task, ...

LEARNING TO LEARN WITHOUT FORGETTING BY ...

This method learns parameters that make inter- ference based on future gradients less likely and transfer based on future gradients more likely.1 We...