Speed up training?
See original GitHub issueHi, I’m trying to retrain the coref model starting from another BERT model trained on different data. It seems the loss values are not going down but another issue is that training seems slow and the GPU is underutilized (screenshot below). Any tips on how to speed up the training or fit more data in the gpu?
TITAN V | 40'C, 0 % | 316 / 12036 MB
I0905 12:20:54.217084 140321920943872 train.py:59] [100] loss=2071.04, steps/s=0.20
I0905 12:21:45.037879 140321920943872 train.py:59] [110] loss=837.87, steps/s=0.20
I0905 12:22:37.386365 140321920943872 train.py:59] [120] loss=1475.69, steps/s=0.20
I0905 12:23:34.424523 140321920943872 train.py:59] [130] loss=1111.34, steps/s=0.20
I0905 12:24:26.693988 140321920943872 train.py:59] [140] loss=1088.69, steps/s=0.20
I0905 12:25:14.780310 140321920943872 train.py:59] [150] loss=792.43, steps/s=0.20
I0905 12:26:08.272615 140321920943872 train.py:59] [160] loss=1597.89, steps/s=0.20
I0905 12:26:55.389269 140321920943872 train.py:59] [170] loss=1087.88, steps/s=0.20
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
7 tricks to speed up the training of a neural network
A guide on how to speed up the training of a neural network and reduce the time in fitting the complex architectures.
Read more >How to Speed Up Training for a New Hire
How to Speed Up Training for a New Hire · Create a Simple Training Outline · Keep It Short · Get the Right...
Read more >How to speed up training of a Neural Network?
This paper talks about a training method where you train only a set of randomly chosen layers and drop the rest with identity...
Read more >How to Run Faster: Speed Training Guide
Sample workout: Run one mile at a pace that's about 10 seconds slower per mile than your 5K race pace, then rest for...
Read more >Speeding Up Neural Network Training with Data Echoing
Data echoing can speed up training whenever computation upstream from accelerators dominates training time. We measured the training speedup ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Around 55K steps for SpanBERT base. Here’s the final part of the log.
Yes, I think the problem could be domain mismatch. This is helpful. Thanks!