question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DeepSpeed assumes model returns just one variable: loss

See original GitHub issue

https://github.com/microsoft/DeepSpeed/blob/4d735946b8f256bc80ba13e3530f85c91d041ff4/deepspeed/pt/deepspeed_light.py#L582-L606

As you can see in line 596 above in the forward() call for the DeepSpeedLight engine, self.module (which gets initialized earlier with the model passed to deepspeed.initialize() by the client) is assumed to be returning just one output: loss. However, as a model developer, I can have many outputs being returned by my model, including several different losses. An example of such a model would be the GPT2DoubleHeadsModel in Hugging Face’s transformers repo, which returns two different losses, one for each head/task.

The consequence of this is that I won’t be able to integrate DeepSpeed as-is to work with such models. Could you please make the necessary changes to be able to support this use-case?

I suspect what you need to do is:

  1. Move lines 599-600 (which perform loss scaling based on gradient accumulation steps) into your implementation of backward().
  2. Update line 596 to reflect the fact that self.module could return a generic tuple of outputs instead of a loss.

This, in turn, will allow the client to define a customized loss by leveraging all the outputs, and then call your implementation of backward() with this loss as input. And things should be fine because the loss scaling is still happening, only in backward() instead of forward(). Does this make sense?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:17 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
g-karthikcommented, Mar 23, 2020

@tjruwase I think instead of 2, you could just change the signature of backward() to also return the loss. The returned loss would be scaled, and typically users log their loss after forward+backward anyway, not just forward.

1reaction
tjruwasecommented, Mar 23, 2020

So it sounds like the consensus is

  1. Remove _scale_loss() from forward() so deepspeed forward is semantically equal to client forward
  2. Expose model_engine.scale_loss(loss) so client can obtain scaled loss
  3. Call _scale_loss() in backward(), assuming client is passing unscaled loss values

Is this correct? I can start preparing the PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for deepspeed.runtime.zero.stage_1_and_2
We need this layer for two reason: # 1. maintain same user API from ... differences from apex.fp16_utils: # - assume all model...
Read more >
Getting Started - DeepSpeed
We illustrate an example usage of DeepSpeed with the following assumptions: You have already integrated DeepSpeed into your model; client_entry.
Read more >
DeepSpeed Integration - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >
Deploy large models on Amazon SageMaker using ...
Large language models can be difficult to host for low-latency inference ... DeepSpeed is an AWS developed large model supporting package.
Read more >
DeepSpeed API Reference - HPE Machine Learning Development ...
Define the DeepSpeed model engine which includes the model, optimizer, and lr_scheduler. ... Assume one model_engine wrapped in ``__init__``. loss ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found