Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`LongT5`: Efficient Text-To-Text Transformer for Long Sequences

See original GitHub issue

🌟 New model addition – LongT5: Efficient Text-To-Text Transformer for Long Sequences

Model description

LongT5 is an extension of the T5 model that handles long sequence inputs more efficiently. We integrated attention ideas from long-input transformers ETC,and adopted pre-training strategies from summarization pre-training PEGASUS into the scalable T5 architecture. The result is a new attention mechanism we call Transient Global(TGlobal), which mimics ETC’s local/globalattention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization and question answering tasks, as well as outperform the original T5 models on these tasks.

Description copied from https://github.com/google-research/longt5/blob/master/README.md.

The full paper is currently available on arXiv – LongT5: Efficient Text-To-Text Transformer for Long Sequences.

Open source status

The model has its own repository available here.

the model implementation is available - the model implementation is available at Google FlaxFormer repo.
the model weights are available: Currently, Google has released five checkpoints listed in the LongT5 repo

LongT5-Local-Base (250 million parameters)
LongT5-TGlobal-Base (250 million parameters)
LongT5-Local-Large (780 million parameters)
LongT5-TGlobal-Large (780 million parameters)
LongT5-TGlobal-XL (3 billion parameters)

who are the authors: @mandyguo-xyguo, Joshua Ainslie, @duthus, @santiontanon, @nijianmo, @yhsung, @yinfeiy, (not sure with some GitHub names, so will be happy if anyone can complete it :] )

Additional context

If anyone from the original authors won’t be interested in porting the model into the transformers, I’ll be more than happy to work on this :].

Issue Analytics

State:
Created a year ago
Reactions:11
Comments:8 (8 by maintainers)

Top GitHub Comments

2reactions

stancldcommented, Apr 13, 2022

@patrickvonplaten @patil-suraj I’m gonna give it a try and will try to open a draft PR as soon as I have some progress! :]

Also @patrickvonplaten, thanks a lot for all the useful links you have posted here! :]

0reactions

patil-surajcommented, Apr 11, 2022

This is super cool! Happy to help if anyone wants to give it a try 😃

Top Results From Across the Web

LongT5: Efficient Text-To-Text Transformer for Long Sequences

In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and...

LongT5: Efficient Text-To-Text Transformer for Long Sequences

In this paper, we present LongT5, a new model that explores the effects of scaling both the input length and model size at...

google-research/longt5 - GitHub

LongT5 : Efficient Text-To-Text Transformer for Long Sequences. LongT5 is an extension of the T5 model that handles long sequence inputs more efficiently....

LongT5 - Hugging Face

The LongT5 model was proposed in LongT5: Efficient Text-To-Text Transformer for Long Sequences by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, ...

LongT5: Efficient Text-To-Text Transformer for ... - OpenReview

In this pa- per, we present LongT5, a new model that explores the effects of scaling both the in- put length and model...