question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why inplace operation in `AdaptiveSoftmax`

See original GitHub issue

I used log_output for training purpose, but pytorch complains about inplace operation on backward.

For lines 204 and 205: https://github.com/pytorch/fairseq/blob/e6422528dae0b899848469efe2dc404c1e639ce9/fairseq/modules/adaptive_softmax.py#L200-L208

  1. why doing a copy_ on tailout
  2. why inplace add_

I changed to

tail_output = ...
... = self.lsm(tail_out) + tail_priors[idxs, i, None]

And it works for training.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Stonesjtucommented, Jun 1, 2019
  1. The current implementation is not working in training mode.
  2. In evaluation mode, if we set the torch.no_grad env, the memory overhead should equal to the tail_priors[idxs, i ,None] (temporarily for the out-of-place add), (edit: plus the size of log_probs[idxs, start:end] for re-use as storage of self.tail[i](input[idxs])).

Actually add operation does not require saving operands for future BP, but pytorch does raise an error for any inplace operators. The reason is that using in-place operation is quite dangerous for chain-rule BP.

I recommend changing to an out-of-place version, I can open a PR and give a simple comparison in terms of memory.

If the memory saving is very critical then I would suggest implementing a Function to by-pass the in-place check mechanism.

0reactions
stale[bot]commented, Apr 27, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Overcome the Large Vocabulary Bottleneck Using an ...
It is a plug and play substitute for the regular softmax. One can simply replace the final softmax with an adaptive one. There...
Read more >
AdaptiveLogSoftmaxWithLoss — PyTorch 1.13 documentation
Adaptive softmax partitions the labels into several clusters, according to their frequency. These clusters may contain different number of targets each.
Read more >
[1609.04309] Efficient softmax approximation for GPUs - arXiv
Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution ...
Read more >
Adaptive Softmax explained in Numpy | Analytics Vidhya
So in Head , the 300 is the input dimension and the 3002 consists of two parts , 3000 is the number of...
Read more >
Source code for fairseq.modules.adaptive_softmax
[docs]class AdaptiveSoftmax(nn.Module): """ This is an implementation of the efficient softmax approximation for graphical processing units (GPU), ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found