Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support multitask learning

See original GitHub issue

🚀 Feature request

There should be an easy way to support multitask learning.

Motivation

It seems like many of the best performing models on the GLUE benchmark make some use of multitask learning (simultaneous training on multiple tasks).

The T5 paper highlights multiple ways of mixing the tasks together during finetuning:

Examples-proportional mixing - sample from tasks proportionally to their dataset size
Equal mixing - sample uniformly from each task
Temperature-scaled mixing - The generalized approach used by multilingual BERT which uses a temperature T, where the mixing rate of each task is raised to the power 1/T and renormalized. When T=1 this is equivalent to equal mixing, and becomes closer to equal mixing with increasing T.

This definitely seems like a reusable component that would allow replication of the headline results used by many models.

The run_glue.py example only trains using a single GLUE task at a time, so I’m assuming people have made their own modifications to allow multitask learning. It seems especially sensible that there should be a way of training the T5 model in the multitask setting as was originally intended by the authors.

Maybe this could be an extension of the Trainer or a wrapper around multiple Datasets? Or even just an example.

Your contribution

I can certainly help with implementation, though would need guidance on the best place to add this functionality.

Issue Analytics

State:
Created 3 years ago
Reactions:12
Comments:6 (2 by maintainers)

Top GitHub Comments

4reactions

ghomasHudsoncommented, May 29, 2020

@avhirupc I’ve been focusing on 1. as it seems to be the simplest (and currently the best performing on GLUE). I think 2. would still rely on this and extend it further (choosing which head to train based on the task, and a bit of modelling work).

Multitask learning seems such an obvious part of current NLP approaches so I’m surprised more people aren’t requesting it (Maybe it’s more of a research-y aim than a production one?)

My current approach I’m working on is simply using ConcatDataset with weights decided using Temperature-scaled mixing. Something like this:

def temperature_to_weights(dataset_lengths, temperature=2.0, maximum=None, scale=1.0):
    '''Calculate mixing rates'''
    mixing_rates = []
    for length in dataset_lengths:
        rate = length * scale
        if maximum:
            rate = min(rate,maximum)
        if temperature != 1.0
            rate = rate ** (1.0/temperature)
        mixing_rates.append(rate)
    return mixing_rates

datasets = [Dataset1(), Dataset2()]
dataset_lengths = [len(d) for d in datasets]
dataset_weights = temperature_to_weights(dataset_lengths)

# Calculate weights per sample
weights = []
for i in range(len(datasets)):
    weights += [dataset_weights[i]] * len(datasets[i])

dataloader = Dataloader(ConcatDataset(datasets),
                        sampler=WeightedRandomSampler(
                                           num_samples=min(dataset_lengths),
                                           weight=weights,
                                           replacement=False) 
                        )

There’s still a few things I’m unclear about (e.g. what should the num_samples be? clearly if we sample everything it’s just the same as not doing any balancing at all). Would be nice to have a MultitaskSampler if I can work it out.

@enzoampil I’m open to ideas about the best way to do this in terms of interfacing with the library. Should properly open an issue over at nlp.

1reaction

patrickvonplatencommented, Jun 3, 2020

Hi everybody,

I think the nlp library is the best place for this as @enzoampil said.