`LongT5`: Efficient Text-To-Text Transformer for Long Sequences
See original GitHub issue🌟 New model addition – LongT5: Efficient Text-To-Text Transformer for Long Sequences
Model description
LongT5 is an extension of the T5 model that handles long sequence inputs more efficiently. We integrated attention ideas from long-input transformers ETC,and adopted pre-training strategies from summarization pre-training PEGASUS into the scalable T5 architecture. The result is a new attention mechanism we call Transient Global(TGlobal), which mimics ETC’s local/globalattention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization and question answering tasks, as well as outperform the original T5 models on these tasks.
Description copied from https://github.com/google-research/longt5/blob/master/README.md.
The full paper is currently available on arXiv – LongT5: Efficient Text-To-Text Transformer for Long Sequences.
Open source status
The model has its own repository available here.
- the model implementation is available - the model implementation is available at Google FlaxFormer repo.
- the model weights are available: Currently, Google has released five checkpoints listed in the LongT5 repo
- LongT5-Local-Base (250 million parameters)
- LongT5-TGlobal-Base (250 million parameters)
- LongT5-Local-Large (780 million parameters)
- LongT5-TGlobal-Large (780 million parameters)
- LongT5-TGlobal-XL (3 billion parameters)
- who are the authors: @mandyguo-xyguo, Joshua Ainslie, @duthus, @santiontanon, @nijianmo, @yhsung, @yinfeiy, (not sure with some GitHub names, so will be happy if anyone can complete it :] )
Additional context
If anyone from the original authors won’t be interested in porting the model into the transformers
, I’ll be more than happy to work on this :].
Issue Analytics
- State:
- Created a year ago
- Reactions:11
- Comments:8 (8 by maintainers)
@patrickvonplaten @patil-suraj I’m gonna give it a try and will try to open a draft PR as soon as I have some progress! :]
Also @patrickvonplaten, thanks a lot for all the useful links you have posted here! :]
This is super cool! Happy to help if anyone wants to give it a try 😃