Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The sequence of operations in the Linformer Attention module is probably wrong.

See original GitHub issue

🐛 Bug

Hi guys! In Linformer’s example source code, I found that operation order may not match the official paper mathematics.

Here, in the code, the linear attention is done in the following sequence of the two operations:

a Linear projection from n token’s representations to k.
a Linear projection over the embedding dimension (d_m to d_k). as here (#208 and #213 respectively)

On the contrary, this image from the Linformer paper states that it should be performed in the order of:

a Linear projection over the embedding dimension (d_m to d_k).
a Linear projection from n token’s representations to k. As seen in the picture below:

Am I missing something important here? If anything gets confirmed, I am up for fixing it.

Environment

Current fairseq version.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:5 (1 by maintainers)

Top GitHub Comments

2reactions

patil-surajcommented, Dec 26, 2020

Any update on this?

2reactions

myleottcommented, Nov 20, 2020

Read more comments on GitHub >

Top Results From Across the Web

The sequence of operations in the Linformer Attention module ...

Bug Hi guys! In Linformer's example source code, I found that operation order may not match the official paper mathematics.

Linformer: Self-Attention with Linear Complexity - arXiv

In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to ...

Rethinking Attention with Performers (Paper Explained)

ai #research #attentionTransformers have huge memory and compute requirements because they construct an Attention matrix, ...

Self-Attention with Linear Complexity (Paper Explained)

In this paper, we demonstrate that the self- attention mechanism can be ... The resulting linear transformer, the \textit{ Linformer }, ...

Sketching as a Tool for Understanding and Accelerating Self ...

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention.

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

AttributeError: 'Wav2VecCtc' object has no attribute 'remove_pretraining_modules'

How to fine-tune wav2vec 2.0 with TIMIT