TypeError: FP16_DeepSpeedZeroOptimizer is not an Optimizer
See original GitHub issueI’m trying to use 1-Cycle scheduler, but I meet the following error :
TypeError: FP16_DeepSpeedZeroOptimizer is not an Optimizer
Here is my configuration file :
{
"train_batch_size": 64,
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 16,
"optimizer": {
"type": "Adam",
"params": {
"lr": 3e-05,
"betas": [
0.9,
0.999
],
"eps": 1e-8,
"weight_decay": 0.01
}
},
"gradient_clipping": 0.1,
"scheduler": {
"type": "OneCycle",
"params": {
"cycle_first_step_size": 16000,
"cycle_first_stair_count": 8000,
"decay_step_size": 16000,
"cycle_min_lr": 1e-06,
"cycle_max_lr": 3e-05,
"decay_lr_rate": 1e-07,
"cycle_min_mom": 0.85,
"cycle_max_mom": 0.99,
"decay_mom_rate": 0.0
}
},
"zero_optimization": true,
"disable_allgather": true,
"fp16": {
"enabled": true,
"loss_scale": 0,
"min_loss_scale": 1
}
}
When using another Scheduler (with FP16), I meet no problem.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
DeepSpeed Configuration JSON
A variant optimizer for 1-bit Adam is 0/1 Adam, which further optimizes 1-bit Adam ... enabled is a fp16 parameter indicating whether or...
Read more >pytorch_lightning.strategies.deepspeed - PyTorch Lightning
Currently only Adam is a DeepSpeed supported optimizer when using ZeRO. ... logging level for deepspeed. loss_scale: Loss scaling value for FP16 training....
Read more >Automatic Mixed Precision package - torch.amp - PyTorch
Backward passes under autocast are not recommended. ... Creates model and optimizer in default precision model = Net().cuda() optimizer = optim.
Read more >DeepSpeed Optimizer Problems - Stack Overflow
I keep having this trouble with the optimizer and I am not sure what ... /python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", ...
Read more >DeepSpeed Integration - Hugging Face
Optimizer state partitioning (ZeRO stage 1); Gradient partitioning (ZeRO stage 2) ... This may or may not match the GPUs on the target...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for reporting this bug. We will take a look at this as soon as possible. I just created two test cases that reproduce the error (one with ZeRO and one with FP16 but no ZeRO).
https://github.com/microsoft/DeepSpeed/blob/jeffra/onecycle_bug/tests/unit/test_fp16.py#L147-L246
Hi @Colanim, it should be up to date. Can you tell us this info from inside your docker container?
python -c 'import deepspeed; print("deepspeed info:", deepspeed.__version__, deepspeed.__git_branch__, deepspeed.__git_hash__)'
Also I just looked at the lasted docker build, it prints this same version info and it looks to be aligned with the latest March 12th commit (3d3f8d36a4e8c0b7e6358bccd90254fc7424ffcb): https://dev.azure.com/DeepSpeedMSFT/DeepSpeed/_build/results?buildId=416&view=logs&j=3dc8fd7e-4368-5a92-293e-d53cefc8c4b3&t=a1aa9649-a94b-5ac4-3f5e-9bb6223edb04&l=1717
** info: 0.1.0 master 3d3f8d3