question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

accelerate test fails with deepspeed and fp16 enabled in config

See original GitHub issue

Hi there,

First, thanks for the great work.

I wanted to give accelerate a spin and followed the docs to setup a configuration file with both deepspeed and fp16 enabled. Here’s the resulting yaml:

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 2
  offload_optimizer_device: cpu
  zero_stage: 3
distributed_type: DEEPSPEED
fp16: true
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
num_machines: 1
num_processes: 2

I then tried to test the setup using: accelerate test --config_file ./my_config.yaml

This then throws an error saying: AttributeError: 'DeepSpeedPlugin' object has no attribute 'fp16' which seems to be stemming from accelerate/state.py line 232: use_fp16 = self.deepspeed_plugin.fp16 if self.distributed_type == DistributedType.DEEPSPEED else self.use_fp16

Let me know if you need any more information 😃

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
nsmdgrcommented, Nov 11, 2021

Sure. I have version 0.5.5 installed today via pip install deepspeed.

0reactions
chris-opendatacommented, May 25, 2022

This is fixed in the latest release.

Thank you for your update.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DeepSpeed Integration - Hugging Face
Integration of the core DeepSpeed features via Trainer. This is an everything-done-for-you type of integration - just supply your custom config file or...
Read more >
DeepSpeed Configuration JSON
DeepSpeed Configuration JSON. Contents. Batch Size Related Parameters; Optimizer Parameters; Scheduler Parameters; Communication options; FP16 training ...
Read more >
Deploy BLOOM-176B and OPT-30B on Amazon SageMaker ...
Throughput reflects the number of tokens produced per second for each test. For Hugging Face Accelerate, we used the library's default loading ...
Read more >
Train 1 trillion+ parameter models - PyTorch Lightning
Check out this amazing video explaining model parallelism and how it works behind the scenes: ... Below is a summary of all the...
Read more >
Accelerate Stable Diffusion inference with DeepSpeed ...
Note: You need a machine with a GPU and a compatible CUDA installed. You can check this by running nvidia-smi in your terminal....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found