question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question on Prefix tuning code

See original GitHub issue

Hi, I am looking at prefix tuning code…I have few queries on the implementation.

  1. what exactly are the variables in these lines? I understand that prefix tuning provides input to every layer of the encoder-decoder model…But my understanding is that there should be a single wte and control_trans; not sure what the variables in the highlighted lines do.
  2. I dont understand why the *2 in this line of code?
  3. What does the control_trans variable mean in the code? what is its function?
  4. Also, I see another variable mid_dim. What is it conceptually?

Thank you

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ChenWu98commented, May 24, 2022

Yes, wte is for the cross-attention. We did not use “word initialization” for the prefix because they showed the benefits of such initialization under a low-data setting with only 100 samples. Actually, I am not sure, in Lisa’s paper, whether this “word initialization” was used jointly with the re-parameterization trick or only used for the embedding-only ablation. If you have any ideas, please let me know!

Yes, we assume num_encoder_layers == num_decoder_layers.

The permutation operation is used to make the tensor shape compatible with that of the key-value pairs.

1reaction
ChenWu98commented, May 23, 2022

Hi, Thanks for your interest!

Our prefix-tuning code is a cleaned-up version of Lisa’s original implementation for BART. Answers to your questions are provided below, but we also recommend you look for more details in Lisa’s implementation and paper.

  1. Separate prefix is learned for each attention: encoder attention, decoder attention, and cross attention.
  2. *2 means the dimension of the attention key plus the dimension of the attention value.
  3. control_trans is part of the re-parameterization trick introduced in Lisa’s paper.
  4. mid_dim is also part of the re-parameterization trick introduced in Lisa’s paper.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Alternative for fine-tuning? Prefix-tuning may be your answer!
A recent paper accepted in ACL 2021 Prefix-Tuning: Optimizing Continuous Prompts for Generation answered this question with a new concept.
Read more >
Prefix-Tuning: Optimizing Continuous Prompts for Generation
The key question is how to augment the LM architecture and decide which subset of pretrained parameters to tune. One line of research...
Read more >
Prefix-Tuning: Optimizing Continuous Prompts for Generation
In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model ...
Read more >
prefix tuning with t5-3b · Issue #27 · HKUNLP/UnifiedSKG
I am trying to run prefix tuning with t5-3b, but I got some strange error ... I just double-checked the code in preifx-tuning...
Read more >
On Robust Prefix-Tuning for Text Classification
In this work, we propose a robust prefix-tuning framework that preserves the efficiency and modularity of prefix-tuning. The core idea of our framework...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found