Add optimal model size and stopping time feature
See original GitHub issue🚀 Feature request
The calculator blog post presented an automated way to find scaling laws with model size and compute budget on language modeling tasks. Adding it to the library would help save on training costs by picking an optimal model size and training time.
Motivation
Estimating how big of a model to use and how long to train for is more of an art than a science. An automated tool to perform that task would allow researchers and practitioners to concentrate on the the high-level parts of their projects as opposed to parameter tweaking.
Your contribution
I can submit a PR with my existing work, probably integrating it within Trainer
and/or knocknock
.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:42
- Comments:14 (7 by maintainers)
Top Results From Across the Web
Fixed Step Solvers in Simulink - MathWorks
Fixed-step solvers solve the model at regular time intervals from the beginning to the end of the simulation. The size of the interval...
Read more >Strategies to Counter Small Automatic Time Steps - COMSOL
We give 3 modeling scenarios where small automatic time steps are encountered and strategies for improving the simulation efficiency in ...
Read more >4 Strategies for Multi-Step Time Series Forecasting
Stop learning Time Series Forecasting the slow way! ... Having one model for each time step is an added computational and maintenance burden ......
Read more >Understand Forward and Backward Stepwise Regression
The stopping rule is satisfied when all remaining variables to consider have a p-value larger than some specified threshold, if added to the...
Read more >What is Underfitting? - IBM
However, if you train the model too much or add too many features to it, ... visual representation of underfit model, an optimal...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ah yes - I remembered having a doubt on that, I checked again the library we used to estimate those and there might have been a unit conversion error, I’ll fix that ASAP tomorrow!
Edit: it’s fixed, thank you @lopuhin !
Great stuff, thank you! The energy estimates look 1000 worse than reality though, V100 running for 12 h should not consume 5432 kWh I think, else we’d be all dead. 5.4 kWh looks more reasonable.