Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pruning function in T5Attention doesnt affect _relative_position_bucket

See original GitHub issue

Who can help?

@patrickvonplaten

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Run pruning function in t5 model, then run inference.

Expected behavior

Relative position head should be pruned too.

Here it is https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L355

Issue Analytics

State:
Created a year ago
Comments:9 (6 by maintainers)

Top GitHub Comments

1reaction

hadaev8commented, Jun 29, 2022

@patrickvonplaten Okay, if you think its ok, i will do pr tomorrow.

0reactions

github-actions[bot]commented, Jul 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Top Results From Across the Web

Review: Self-Attention with Relative Position Representations

The self-attention mechanism in original Transformer is extended to efficiently consider representations of the relative positions, or distances between ...

Head Pruning in Transformer Models! | by Gaurav Ghati

Background work for this article. Importance of attention heads and why pruning is needed. Identifying important heads and their head functions.

On Scalar Embedding of Relative Positions in Attention Models

The bucketing function assigns the rel- ative positions into different buckets through a fixed heuristic algorithm, and the bucket embedding component maps each....

Encoding Relative Positions with Continuous Augmented ...

Abstract: Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings ...

Changing the Narrative Perspective - OhioLINK ETD Center

coreference-modulated self-attention, and prompt-based tuning over either frozen or ... The representations are learned for each relative position within a.