Model doesn't converge
See original GitHub issueWe are trying to apply this method on a medical dataset, and have about 70K images (224 res) for 5 classes. However, our training doesn’t converge (we tried a range of learning rates e.g. 3e-3, 3e-4 etc.) however doesn’t seem to work. Currently our model outputs 45% accuracy where the average accuracy for this dataset is around 85-90% (we trained for 100 epochs). Is there anything else we should tune?
Also, here is our configuration:
batch_size = 64
epochs = 400
lr = 3e-4
gamma = 0.7
seed = 42
efficient_transformer = Linformer(
dim=128,
seq_len=49 + 1, # 7x7 patches + 1 cls-token
depth=4,
heads=8,
k=64
)
# Visual Transformer
model = ViT(
dim=128,
image_size=224,
patch_size=32,
num_classes=5,
transformer=efficient_transformer, # nn.Transformer(d_model=128, nhead=8),
channels=1,
).to(device)
Thank you very much!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:20 (7 by maintainers)
Top Results From Across the Web
Logistic regression model does not converge - Cross Validated
By convergence I mean that the parameters being estimated in the model don't change (or only change less than some small tolerance) between ......
Read more >How to Handle R Warning: glm.fit: algorithm did not converge
This warning often occurs when you attempt to fit a logistic regression model in R and you experience perfect separation – that is,...
Read more >Why would a model not converge in Logistic regression?
I am conducting a study with multiple models attempting to tease out the effects. One model attempting to run with 2 of the...
Read more >The model doesn't converge · Issue #43 · lucidrains/vit-pytorch
It does converge, but a normal CNN on Kinetics400 can have a loss lower than 1. So I believe the main problem is...
Read more >Model Doesn't Converge with 0.6-4 Release
Hello,. I've been running models in lavaan with user-provided covariance and sampling covariance matrices using DWLS estimation and am finding that models that ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@liberbey Hey Ahmet! One of pitfalls of transformers is having settings that result in the dimension per head to be too small. The dimension per head should be at least 32 and best at 64. It can be calculated as
dim // heads
, so in your case, the dimension of each head is16
. Try increasing the dimension to 256 and increasing the sequence length (decrease patch size to 16) I would be very surprised if it does not workDid you use a special learning rate scheduler? My loss curve on my own dataset also shows an uncommon curve, check here. Seems that ViT is hard to train.