accelerate test fails with deepspeed and fp16 enabled in config
See original GitHub issueHi there,
First, thanks for the great work.
I wanted to give accelerate a spin and followed the docs to setup a configuration file with both deepspeed and fp16 enabled. Here’s the resulting yaml:
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 2
offload_optimizer_device: cpu
zero_stage: 3
distributed_type: DEEPSPEED
fp16: true
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
num_machines: 1
num_processes: 2
I then tried to test the setup using:
accelerate test --config_file ./my_config.yaml
This then throws an error saying:
AttributeError: 'DeepSpeedPlugin' object has no attribute 'fp16'
which seems to be stemming from accelerate/state.py
line 232:
use_fp16 = self.deepspeed_plugin.fp16 if self.distributed_type == DistributedType.DEEPSPEED else self.use_fp16
Let me know if you need any more information 😃
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
DeepSpeed Integration - Hugging Face
Integration of the core DeepSpeed features via Trainer. This is an everything-done-for-you type of integration - just supply your custom config file or...
Read more >DeepSpeed Configuration JSON
DeepSpeed Configuration JSON. Contents. Batch Size Related Parameters; Optimizer Parameters; Scheduler Parameters; Communication options; FP16 training ...
Read more >Deploy BLOOM-176B and OPT-30B on Amazon SageMaker ...
Throughput reflects the number of tokens produced per second for each test. For Hugging Face Accelerate, we used the library's default loading ...
Read more >Train 1 trillion+ parameter models - PyTorch Lightning
Check out this amazing video explaining model parallelism and how it works behind the scenes: ... Below is a summary of all the...
Read more >Accelerate Stable Diffusion inference with DeepSpeed ...
Note: You need a machine with a GPU and a compatible CUDA installed. You can check this by running nvidia-smi in your terminal....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sure. I have version 0.5.5 installed today via
pip install deepspeed
.Thank you for your update.