Pruning function in T5Attention doesnt affect _relative_position_bucket
See original GitHub issueWho can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
Run pruning function in t5 model, then run inference.
Expected behavior
Relative position head should be pruned too.
Here it is https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L355
Issue Analytics
- State:
- Created a year ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
Review: Self-Attention with Relative Position Representations
The self-attention mechanism in original Transformer is extended to efficiently consider representations of the relative positions, or distances between ...
Read more >Head Pruning in Transformer Models! | by Gaurav Ghati
Background work for this article. Importance of attention heads and why pruning is needed. Identifying important heads and their head functions.
Read more >On Scalar Embedding of Relative Positions in Attention Models
The bucketing function assigns the rel- ative positions into different buckets through a fixed heuristic algorithm, and the bucket embedding component maps each....
Read more >Encoding Relative Positions with Continuous Augmented ...
Abstract: Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings ...
Read more >Changing the Narrative Perspective - OhioLINK ETD Center
coreference-modulated self-attention, and prompt-based tuning over either frozen or ... The representations are learned for each relative position within a.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@patrickvonplaten Okay, if you think its ok, i will do pr tomorrow.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.