Loss not dropping on custom dataset :(
See original GitHub issueHi, thanks for the wonderful work, @mathildecaron31!
Reported video is inspiring 😄
__
I am experimenting with a custom dataset.
The thing is, it’s totally okay to train vision transformer (deit_small
) in supervised manner and loss drops fine.
Even managed to apply visualize_attention.py
to see heatmaps for a separately trained ViT.
But when I switch to use self-supervised Dino setup, there is almost no change in loss during training.
Do you have idea why it could happen or possible solutions?
__
Thanks!
I am attaching screenshot from training and arguments I have used for training script.
arch ='deit_small'
patch_size = 16
out_dim = 10000 # default 65536
norm_last_layer = False
momentum_teacher = 0.996 # check this according to batch_size
bsize = 256 #####
use_bn_in_head = False
warmup_teacher_temp = 0.0005 # less if does not decrease, default 0.04
teacher_temp = 0.3 # increase if needed, default: 0.04
warmup_teacher_temp_epochs = 0 # default 30 to warmup
use_fp16 = False #disable is loss is unstable, default: True
weight_decay = 0.04 # a smaller value works well
weight_decay_end = 0.4 # final value of weight decay
clip_grad = 3.0 # max parameter gradient norm, 0 for disabling # default, 3.0
batch_size_per_gpu = 256 # reduce this if not fit, default 64
epochs = 100
freeze_last_layer = 5 # default 1, Try increasing this value if the loss does not decrease.
lr = 0.005 #linear with batch size scaled, for ref of 256, def 0.0005
warmup_epochs = 0 #linear warmup def 10
min_lr = 1e-6 # target lr at the optimization
optimizer = 'sgd' # def: adamw
global_crops_scale = (0.4, 1.)
local_crops_number = 8 # local small views
local_crops_scale = (0.05, 0.4) # def (0.05, 0.4)
data_path = train_dataset_dir #
output_dir = "./dirlog"
saveckp_freq = 20
seed = 0 # random seed
num_workers = 40 #def:10
dist_url = "env://"
local_rank = 0
device_ids = [0, 1, 2, 3, 4, 5] # use 6 gpus
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (6 by maintainers)
Top Results From Across the Web
Having issues with neural network training. Loss not decreasing
Try to overfit your network on much smaller data and for many epochs without augmenting first, say one-two batches for many epochs. If...
Read more >Validation loss is not decreasing - Data Science Stack Exchange
The model is overfitting right from epoch 10, the validation loss is increasing while the training loss is decreasing. Dealing with such a...
Read more >Custom loss function not decreasing - PyTorch Forums
Custom loss function not decreasing · First try different learning rates, while simultaneously turning of all regularization. · Try using a standard loss...
Read more >Solving the TensorFlow Keras Model Loss Problem
How to Implement a Non-trivial TensorFlow Keras Loss Function ... Not to mention the fact that the more custom code that you include...
Read more >Why does the loss or accuracy fluctuate during the training?
Very small batch_size · Large network, small dataset · Tensorflow Pooling layers in Convolutional Neural Network · Training Spacy Models on custom data...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @tuttelikz Thanks for your kind words. Can you try the following for improving the stability:
--norm_last_layer true
(This will l2 normalize the last layer weights)--use_fp16 false
(But you’re already doing that 😃 )--optimizer adamw
(Is there a motivation for using sgd instead of adamw ?) and thus adapt the learning rate as before. Also if you choose to use sgd it is posible that you need to re-adapt the weight decay (maybe use a much lower value). I’d recommend starting from the default optim params with adamw.If I understand correctly the effective batch size is 1536 (256 * 6). Can you try reducing that a bit ? I’ve observed that large batch training can be unstable.
Hope that helps.
Best way to thank me is to star the repo haha 😄