Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model doesn't converge

See original GitHub issue

We are trying to apply this method on a medical dataset, and have about 70K images (224 res) for 5 classes. However, our training doesn’t converge (we tried a range of learning rates e.g. 3e-3, 3e-4 etc.) however doesn’t seem to work. Currently our model outputs 45% accuracy where the average accuracy for this dataset is around 85-90% (we trained for 100 epochs). Is there anything else we should tune?

Also, here is our configuration:

batch_size = 64
epochs = 400
lr = 3e-4
gamma = 0.7
seed = 42

efficient_transformer = Linformer(
    dim=128,
    seq_len=49 + 1,  # 7x7 patches + 1 cls-token
    depth=4,
    heads=8,
    k=64
)

# Visual Transformer

model = ViT(
    dim=128,
    image_size=224,
    patch_size=32,
    num_classes=5,
    transformer=efficient_transformer,  # nn.Transformer(d_model=128, nhead=8),
    channels=1,
).to(device)

Thank you very much!

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:20 (7 by maintainers)

Top GitHub Comments

2reactions

lucidrainscommented, Dec 21, 2020

@liberbey Hey Ahmet! One of pitfalls of transformers is having settings that result in the dimension per head to be too small. The dimension per head should be at least 32 and best at 64. It can be calculated as dim // heads, so in your case, the dimension of each head is 16. Try increasing the dimension to 256 and increasing the sequence length (decrease patch size to 16) I would be very surprised if it does not work

1reaction

SuX97commented, Dec 24, 2020

@lucidrains Thanks again! We will try to find a larger dataset. By the way, these are validation results, not test results. So we wondered if there could be another problem about our approach. Because we were expecting that the test results would be bad due to not using pretrained model but not the validation set… Also, do you have any suggestions by the dramatic drop around 80th epoch?

Did you use a special learning rate scheduler? My loss curve on my own dataset also shows an uncommon curve, check here. Seems that ViT is hard to train.

Top Results From Across the Web

Logistic regression model does not converge - Cross Validated

By convergence I mean that the parameters being estimated in the model don't change (or only change less than some small tolerance) between ......

How to Handle R Warning: glm.fit: algorithm did not converge

This warning often occurs when you attempt to fit a logistic regression model in R and you experience perfect separation – that is,...

Why would a model not converge in Logistic regression?

I am conducting a study with multiple models attempting to tease out the effects. One model attempting to run with 2 of the...

The model doesn't converge · Issue #43 · lucidrains/vit-pytorch

It does converge, but a normal CNN on Kinetics400 can have a loss lower than 1. So I believe the main problem is...

Model Doesn't Converge with 0.6-4 Release

Hello,. I've been running models in lavaan with user-provided covariance and sampling covariance matrices using DWLS estimation and am finding that models that ......