Trained on small dataset with pre-trained weight, don't have good result.
See original GitHub issuepretrained_v = timm.create_model('vit_base_patch16_224', pretrained=True)
pretrained_v.head = nn.Linear(768,2)
I tried Kaggle Cats vs Dogs Dataset for binary classification. Didn’t work, output is all cat or all dog.
Any idea how to make it work at small dataset? (less than 10000 or even less than 1000)
PS: Adam, lr = 1e-2
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (3 by maintainers)
Top Results From Across the Web
Train A Strong Classifier with Small Dataset, From Scratch ...
Then here's the results when train a model from pre-trained weights, but without any augmentation techniques again. (yes this is not normal…).
Read more >How do pre-trained models work? - Towards Data Science
We start by loading a pretrained model. Initially, we only train the added layers. We do so because the weights of these layers...
Read more >Fine-Tuning A Pre-Trained Model Affects Performance
by leveraging pre-learnt weights from a pretrained model to another (5). ... only a small dataset is available, transfer learning.
Read more >Why I need pre-trained weight in transfer learning
You don't have to use a pretrained network in order to train a model for your task. However, in practice using a pretrained...
Read more >How to use a pre-trained deep learning model - Educative.io
The model would give good results as the new dataset is very similar to the ... model is well- trained and that it...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
lower learning rate and SGD are better for fine-tuning, don’t use Adam
I also tried the experiment. lr = 3e-5 batch_size = 8
emmmmm,not bad. I think it will better if i can tunning the parameter.