question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Causal linear attention benchmark

See original GitHub issue

First, thanks for this awesome repo!!

Based on T5 model classes from Huggingface’s transformers, I was trying to use performer attention instead of original T5 attention. We finetuned t5-large with summarization model, and tried to profile both time and memory usage, and compare the performer attention with the original attention. I have only benchmarked with input size of 1024.

The result clearly showed that performer attention use lot less memory compared to the original transformer. I know from the paper that performer outperforms the original transformer when input size is bigger than 1024. However, finetuning and generation with the performer actually took longer, so I profiled the forward call of both the original T5 attention and the performer attention. The forward of T5 performer took twice longer and the main bottleneck was causal_dot_product_kernel from fast-transformers.

Is this a normal performace of the performer or causal attention calculation? or Will the performer attention be faster with the bigger input size?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
lucidrainscommented, Apr 20, 2021

ok! i’ll work on the other issue (fast generation) - glad to hear the original issue is resolved!

1reaction
ice-americanocommented, Apr 19, 2021

Actually installing from pip or building from source took a while, and that should have happend due to EPFL compilation(I have a shallow knowlodge on cuda kernel or library 😅).

We have fixed our code to use SelfAttention instead of FastAttention, and we might have been setting wrong parameters or etc, since now the performance and speed of the performer looks similar to wha the paper was specifying. So I think you can close this issue for now, and thanks for responsive feedbacks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Rethinking Attention with Performers - Google AI Blog
We first benchmark the space- and time-complexity of the Performer and show that the attention speedups and memory reductions are empirically ...
Read more >
Luna: Linear Unified Nested Attention - arXiv
We perform extensive evaluations on three benchmarks of sequence ... The complexity of the causal attention in Luna is still linear: O(ln).
Read more >
Benchmarking Attention-Based Interpretability of Deep ... - MDPI
The benchmark enables empirical evaluation of the performance of attention based deep neural networks in three different aspects: (i) prediction ...
Read more >
DecBERT: Enhancing the Language ... - Papers With Code
In contrast, Transformer Decoder with the causal attention masks is ... a new pre-trained language model DecBERT and evaluate it on the GLUE...
Read more >
DecBERT: Enhancing the Language ... - ACL Anthology
benchmark. Experimental results show that (1) the causal attention mask is effective for BERT on the language understanding tasks; (2) our.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found