Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Adapter and LoRA for Roberta

See original GitHub issue

Runing the below setting on SST2 and MNLI: `attn_mode=“adapter” attn_option=“sequential” attn_composition=“add” attn_bn=200 # attn bottleneck dim

ffn_mode=“adapter” ffn_option=“sequential” ffn_adapter_layernorm_option=“none” ffn_adapter_init_option=“bert” ffn_adapter_scalar=“1” ffn_bn=200 # ffn bottleneck dim `

Several errors were raised. It seems some parameters were set incorrectly like d_model, dropout, in modeling_roberta.py.

I just fix them and the log make me confused:

Houlsby added adapted in two places: after self-attention and after FFN. So why add Adapter inside the self-attention and what’s the adapter_layer_norm_before.weight used for?

Thanks

Issue Analytics

State:
Created a year ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

jxhecommented, Mar 25, 2022

We never tried tuning the classifier head but training it sounds like a reasonably cheap trick that should give higher performance. I am not sure about the reason, I guess that it is easy to learn features that can be separated by a random projection given that both MNLI and SST2 are just two- or three-way classification.
The main advantage or the most intriguing part of parameter-efficient tuning is not about reducing the training cost – it doesn’t really reduce the training cost much since it often takes longer to converge. In my opinion, the advantages are:
(1) storage savings as you mentioned; (2) more interestingly, it has potentials on multi-task settings – one small module is responsible for one task/domain while most of the parameters are shared, this separates model capacities in a modular way and may enable many applications, for example, merging multiple adapters efficiently to create models that perform well on multiple domains; continuously adding new capabilities without breaking original capabilities to an existing system by just adding trained adapters. These may not be achieved easily by traditional fine-tuning. (3) parameter-efficient tuning mitigates catastrophic forgetting by design since old parameters are frozen. (4) tuning few parameters is also shown in some papers to be more robust than full fine-tuning and is superior for few-shot learning

1reaction

Albert-Macommented, Mar 25, 2022

Thanks for your reply！

I found the classifier layer which is on top of the pre-trained model, is only initialized and never trained. But it still got a high performance, so just curious about the reason. When unfreezing this layer during training, the performance got slightly improvement.
The training speed and the GPU memory footprint got just x1 speedup and decrease about 1-3 times, respectively, in my server. Is this right? In addition to only storing parameters of a large model and several small modules for multiple tasks, are there other potential advantages of effective parameter tuning?

Top Results From Across the Web

LoRA: Low-Rank Adaptation of Large Language Models

We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, ......

loralib - PyPI

PyTorch implementation of low-rank adaptation (LoRA), a parameter-efficient approach to adapt a large pre-trained deep learning model which ...

Create a LoRa node, part 2: Prepare the module adapter plate

This is part 2 in how to create a LoRa node.In this movie the module adapter plate is prepared so the HopeRF RFM95...

[PDF] Plug-and-Play Adaptation for Continuously-updated QA ...

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters ... infusion and captures richer factual and commonsense knowledge than RoBERTa.

LoRa adapter - will be discontinued by mid 2024

Otherwise you will get a 403 FORBIDDEN error even-though the supplied credentials are correct. Register a device and credentials for the LoRa Network...