Using train_bert_ds.py can not converge.
See original GitHub issueI run the example of HelloDeepSpeed, the following experiments converge normally. I can see a loss drop.
python train_bert.py --checkpoint_dir ./experiments --local_rank 0
However, train_bert_ds.py can not converge. The loss is always 10.9**.
deepspeed train_bert_ds.py --checkpoint_dir ./ds_exp
Why?
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Neural Network training with PyBrain won't converge
So the network is converging, but there is no way to get the best trained network. The documentation of PyBrain implies that the...
Read more >can not converge if hidden_dim of transformer is setted to 512 ...
I'm trying to run training DETR with hidden_dim of transformer as 512 and getting an error I started with default command, and it...
Read more >Neural network does not converge with negative symbols
I've created a simple 2-2-1 feedforward ANN to predict an XOR using Keras. The activation function I'm using on all layers is a...
Read more >Training and Convergence - Databricks
A key component of most artificial intelligence and machine learning is looping, i.e. the system improving over many iterations of training.
Read more >Why gradient descent doesn't converge with unscaled features?
In this super short blog, I have explained what happens behind the scene with our favorite Gradient Descent algorithm when it is fed...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@tjruwase With the PR appleid, the bert example converges as expected.
@MihaiBalint, awesome! Thanks for the quick confirmation.