question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Longformer] Output both local attentions and global attentions when `output_attentions=True` -> Good Second Issue

See original GitHub issue

šŸš€ Feature request

Good Second Issue - A more advanced issue for contributors who want to dive more into Longformer’s attention mechanism.

Longformer currently only outputs global attentions, which is suboptimal because users might be interested in the local attentions as well. I propose to change the ā€œoutput_attentionā€ logic as follows in longformer:

attentions should correspond to the ā€œlocalā€ attentions and then we’ll add a new output type global_attention that contains the global_attentions. This is consistent with the naming of attention_mask and global_attention_mask IMO and the cleanest way to implement the feature.

Implementing this feature would mean to that Longformer will require its own ModelOutput class => BaseModelOutput, => LongformerBaseModelOutput or BaseModelOutputWithGlobalAttention (prefer the first name though) BaseModelOutputWithPooling, => …

Also some tests will have to be adapted.

This is a slightly more difficult issue, so I’m happy to help on it. One should understand the difference between local and global attention and how Longformer’s attention is different to e.g. Bert’s attention in general.

For more detail check out discussion here: https://github.com/huggingface/transformers/issues/5646

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
gui11aumecommented, Oct 4, 2020

I have made the pull request.

I checked that the Longformer tests passed with my changes, and I added one more test to check the output of attention probabilities.

Quite stupidly I made the pull request to the master branch, I am sorry about this. I left it as is to avoid duplicating pull requests for now. You can reject it and I will make a cleaner pull request to a separate branch.

2reactions
gui11aumecommented, Oct 3, 2020

I am working on a pull request to address this. I don’t see any major challenge so far, but this made me realize how much attentions in Bert-like models and in Longformers are different. Why not replace attentions in the Longformer by local_attentions?

This means that the interface of Longformers would become incompatible with every other Transformer, but maybe it should be? I don’t think that there is a way to plug Longformer attentions into a code that expects Bert-like attentions and get meaningful results, so users always have to write a special case for Longformers if they use them. As is, the risk is that they get bogus output and won’t realize it until they carefully read the doc (that is not yet written).

What are your thoughts on this @patrickvonplaten?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Longformer — transformers 3.0.2 documentation - Hugging Face
Longformer self attention employs self attention on both a ā€œlocalā€ context and a ā€œglobalā€ context. Most tokens only attend ā€œlocallyā€ to each other...
Read more >
Simple Local Attentions Remain Competitive for Long-Context ...
Given these intriguing findings, we aim to investigate the following questions: How effective are the long-range mechanism in local attention ...
Read more >
Poolingformer: Long Document Modeling with Pooling Attention
In this paper, we introduce a two-level attention schema, Poolingformer, ... of both local and global attention patterns to reduce computational cost.
Read more >
Longformer Explained - Papers With Code
The attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention....
Read more >
Poolingformer: Long Document Modeling with Pooling Attention
of both local and global attention patterns to reduce com- putational cost. ... The output of the second level pooling attention for token...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found