question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model doesn't converge

See original GitHub issue

We are trying to apply this method on a medical dataset, and have about 70K images (224 res) for 5 classes. However, our training doesn’t converge (we tried a range of learning rates e.g. 3e-3, 3e-4 etc.) however doesn’t seem to work. Currently our model outputs 45% accuracy where the average accuracy for this dataset is around 85-90% (we trained for 100 epochs). Is there anything else we should tune?

Also, here is our configuration:

batch_size = 64
epochs = 400
lr = 3e-4
gamma = 0.7
seed = 42

efficient_transformer = Linformer(
    dim=128,
    seq_len=49 + 1,  # 7x7 patches + 1 cls-token
    depth=4,
    heads=8,
    k=64
)

# Visual Transformer

model = ViT(
    dim=128,
    image_size=224,
    patch_size=32,
    num_classes=5,
    transformer=efficient_transformer,  # nn.Transformer(d_model=128, nhead=8),
    channels=1,
).to(device)

Thank you very much!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:20 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
lucidrainscommented, Dec 21, 2020

@liberbey Hey Ahmet! One of pitfalls of transformers is having settings that result in the dimension per head to be too small. The dimension per head should be at least 32 and best at 64. It can be calculated as dim // heads, so in your case, the dimension of each head is 16. Try increasing the dimension to 256 and increasing the sequence length (decrease patch size to 16) I would be very surprised if it does not work

1reaction
SuX97commented, Dec 24, 2020

@lucidrains Thanks again! We will try to find a larger dataset. By the way, these are validation results, not test results. So we wondered if there could be another problem about our approach. Because we were expecting that the test results would be bad due to not using pretrained model but not the validation set… Also, do you have any suggestions by the dramatic drop around 80th epoch?

Did you use a special learning rate scheduler? My loss curve on my own dataset also shows an uncommon curve, check here. Seems that ViT is hard to train.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Logistic regression model does not converge - Cross Validated
By convergence I mean that the parameters being estimated in the model don't change (or only change less than some small tolerance) between ......
Read more >
How to Handle R Warning: glm.fit: algorithm did not converge
This warning often occurs when you attempt to fit a logistic regression model in R and you experience perfect separation – that is,...
Read more >
Why would a model not converge in Logistic regression?
I am conducting a study with multiple models attempting to tease out the effects. One model attempting to run with 2 of the...
Read more >
The model doesn't converge · Issue #43 · lucidrains/vit-pytorch
It does converge, but a normal CNN on Kinetics400 can have a loss lower than 1. So I believe the main problem is...
Read more >
Model Doesn't Converge with 0.6-4 Release
Hello,. I've been running models in lavaan with user-provided covariance and sampling covariance matrices using DWLS estimation and am finding that models that ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found