question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Adapter and LoRA for Roberta

See original GitHub issue

Runing the below setting on SST2 and MNLI: `attn_mode=“adapter” attn_option=“sequential” attn_composition=“add” attn_bn=200 # attn bottleneck dim

ffn_mode=“adapter” ffn_option=“sequential” ffn_adapter_layernorm_option=“none” ffn_adapter_init_option=“bert” ffn_adapter_scalar=“1” ffn_bn=200 # ffn bottleneck dim `

Several errors were raised. It seems some parameters were set incorrectly like d_model, dropout, in modeling_roberta.py.

I just fix them and the log make me confused: image

Houlsby added adapted in two places: after self-attention and after FFN. So why add Adapter inside the self-attention and what’s the adapter_layer_norm_before.weight used for?

Thanks

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jxhecommented, Mar 25, 2022
  1. We never tried tuning the classifier head but training it sounds like a reasonably cheap trick that should give higher performance. I am not sure about the reason, I guess that it is easy to learn features that can be separated by a random projection given that both MNLI and SST2 are just two- or three-way classification.

  2. The main advantage or the most intriguing part of parameter-efficient tuning is not about reducing the training cost – it doesn’t really reduce the training cost much since it often takes longer to converge. In my opinion, the advantages are:
    (1) storage savings as you mentioned; (2) more interestingly, it has potentials on multi-task settings – one small module is responsible for one task/domain while most of the parameters are shared, this separates model capacities in a modular way and may enable many applications, for example, merging multiple adapters efficiently to create models that perform well on multiple domains; continuously adding new capabilities without breaking original capabilities to an existing system by just adding trained adapters. These may not be achieved easily by traditional fine-tuning. (3) parameter-efficient tuning mitigates catastrophic forgetting by design since old parameters are frozen. (4) tuning few parameters is also shown in some papers to be more robust than full fine-tuning and is superior for few-shot learning

1reaction
Albert-Macommented, Mar 25, 2022

Thanks for your reply!

  1. I found the classifier layer which is on top of the pre-trained model, is only initialized and never trained. But it still got a high performance, so just curious about the reason. When unfreezing this layer during training, the performance got slightly improvement.
  2. The training speed and the GPU memory footprint got just x1 speedup and decrease about 1-3 times, respectively, in my server. Is this right? In addition to only storing parameters of a large model and several small modules for multiple tasks, are there other potential advantages of effective parameter tuning?
Read more comments on GitHub >

github_iconTop Results From Across the Web

LoRA: Low-Rank Adaptation of Large Language Models
We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, ......
Read more >
loralib - PyPI
PyTorch implementation of low-rank adaptation (LoRA), a parameter-efficient approach to adapt a large pre-trained deep learning model which ...
Read more >
Create a LoRa node, part 2: Prepare the module adapter plate
This is part 2 in how to create a LoRa node.In this movie the module adapter plate is prepared so the HopeRF RFM95...
Read more >
[PDF] Plug-and-Play Adaptation for Continuously-updated QA ...
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters ... infusion and captures richer factual and commonsense knowledge than RoBERTa.
Read more >
LoRa adapter - will be discontinued by mid 2024
Otherwise you will get a 403 FORBIDDEN error even-though the supplied credentials are correct. Register a device and credentials for the LoRa Network...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found