question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The sequence of operations in the Linformer Attention module is probably wrong.

See original GitHub issue

🐛 Bug

Hi guys! In Linformer’s example source code, I found that operation order may not match the official paper mathematics.

Here, in the code, the linear attention is done in the following sequence of the two operations:

  1. a Linear projection from n token’s representations to k.
  2. a Linear projection over the embedding dimension (d_m to d_k). as here (#208 and #213 respectively) code_linformer

On the contrary, this image from the Linformer paper states that it should be performed in the order of:

  1. a Linear projection over the embedding dimension (d_m to d_k).
  2. a Linear projection from n token’s representations to k. As seen in the picture below: linformer

Am I missing something important here? If anything gets confirmed, I am up for fixing it.

Environment

Current fairseq version.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
patil-surajcommented, Dec 26, 2020

Any update on this?

2reactions
myleottcommented, Nov 20, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

The sequence of operations in the Linformer Attention module ...
Bug Hi guys! In Linformer's example source code, I found that operation order may not match the official paper mathematics.
Read more >
Linformer: Self-Attention with Linear Complexity - arXiv
In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to ...
Read more >
Rethinking Attention with Performers (Paper Explained)
ai #research #attentionTransformers have huge memory and compute requirements because they construct an Attention matrix, ...
Read more >
Self-Attention with Linear Complexity (Paper Explained)
In this paper, we demonstrate that the self- attention mechanism can be ... The resulting linear transformer, the \textit{ Linformer }, ...
Read more >
Sketching as a Tool for Understanding and Accelerating Self ...
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found