Support multitask learning
See original GitHub issue🚀 Feature request
There should be an easy way to support multitask learning.
Motivation
It seems like many of the best performing models on the GLUE benchmark make some use of multitask learning (simultaneous training on multiple tasks).
The T5 paper highlights multiple ways of mixing the tasks together during finetuning:
- Examples-proportional mixing - sample from tasks proportionally to their dataset size
- Equal mixing - sample uniformly from each task
- Temperature-scaled mixing - The generalized approach used by multilingual BERT which uses a temperature T, where the mixing rate of each task is raised to the power 1/T and renormalized. When T=1 this is equivalent to equal mixing, and becomes closer to equal mixing with increasing T.
This definitely seems like a reusable component that would allow replication of the headline results used by many models.
The run_glue.py
example only trains using a single GLUE task at a time, so I’m assuming people have made their own modifications to allow multitask learning. It seems especially sensible that there should be a way of training the T5 model in the multitask setting as was originally intended by the authors.
Maybe this could be an extension of the Trainer or a wrapper around multiple Datasets? Or even just an example.
Your contribution
I can certainly help with implementation, though would need guidance on the best place to add this functionality.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:12
- Comments:6 (2 by maintainers)
Top GitHub Comments
@avhirupc I’ve been focusing on 1. as it seems to be the simplest (and currently the best performing on GLUE). I think 2. would still rely on this and extend it further (choosing which head to train based on the task, and a bit of modelling work).
Multitask learning seems such an obvious part of current NLP approaches so I’m surprised more people aren’t requesting it (Maybe it’s more of a research-y aim than a production one?)
My current approach I’m working on is simply using
ConcatDataset
with weights decided using Temperature-scaled mixing. Something like this:There’s still a few things I’m unclear about (e.g. what should the
num_samples
be? clearly if we sample everything it’s just the same as not doing any balancing at all). Would be nice to have aMultitaskSampler
if I can work it out.@enzoampil I’m open to ideas about the best way to do this in terms of interfacing with the library. Should properly open an issue over at nlp.
Hi everybody,
I think the
nlp
library is the best place for this as @enzoampil said.