question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support multitask learning

See original GitHub issue

🚀 Feature request

There should be an easy way to support multitask learning.

Motivation

It seems like many of the best performing models on the GLUE benchmark make some use of multitask learning (simultaneous training on multiple tasks).

The T5 paper highlights multiple ways of mixing the tasks together during finetuning:

  • Examples-proportional mixing - sample from tasks proportionally to their dataset size
  • Equal mixing - sample uniformly from each task
  • Temperature-scaled mixing - The generalized approach used by multilingual BERT which uses a temperature T, where the mixing rate of each task is raised to the power 1/T and renormalized. When T=1 this is equivalent to equal mixing, and becomes closer to equal mixing with increasing T.

This definitely seems like a reusable component that would allow replication of the headline results used by many models.

The run_glue.py example only trains using a single GLUE task at a time, so I’m assuming people have made their own modifications to allow multitask learning. It seems especially sensible that there should be a way of training the T5 model in the multitask setting as was originally intended by the authors.

Maybe this could be an extension of the Trainer or a wrapper around multiple Datasets? Or even just an example.

Your contribution

I can certainly help with implementation, though would need guidance on the best place to add this functionality.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:12
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
ghomasHudsoncommented, May 29, 2020

@avhirupc I’ve been focusing on 1. as it seems to be the simplest (and currently the best performing on GLUE). I think 2. would still rely on this and extend it further (choosing which head to train based on the task, and a bit of modelling work).

Multitask learning seems such an obvious part of current NLP approaches so I’m surprised more people aren’t requesting it (Maybe it’s more of a research-y aim than a production one?)

My current approach I’m working on is simply using ConcatDataset with weights decided using Temperature-scaled mixing. Something like this:

def temperature_to_weights(dataset_lengths, temperature=2.0, maximum=None, scale=1.0):
    '''Calculate mixing rates'''
    mixing_rates = []
    for length in dataset_lengths:
        rate = length * scale
        if maximum:
            rate = min(rate,maximum)
        if temperature != 1.0
            rate = rate ** (1.0/temperature)
        mixing_rates.append(rate)
    return mixing_rates

datasets = [Dataset1(), Dataset2()]
dataset_lengths = [len(d) for d in datasets]
dataset_weights = temperature_to_weights(dataset_lengths)

# Calculate weights per sample
weights = []
for i in range(len(datasets)):
    weights += [dataset_weights[i]] * len(datasets[i])

dataloader = Dataloader(ConcatDataset(datasets),
                        sampler=WeightedRandomSampler(
                                           num_samples=min(dataset_lengths),
                                           weight=weights,
                                           replacement=False) 
                        )

There’s still a few things I’m unclear about (e.g. what should the num_samples be? clearly if we sample everything it’s just the same as not doing any balancing at all). Would be nice to have a MultitaskSampler if I can work it out.

@enzoampil I’m open to ideas about the best way to do this in terms of interfacing with the library. Should properly open an issue over at nlp.

1reaction
patrickvonplatencommented, Jun 3, 2020

Hi everybody,

I think the nlp library is the best place for this as @enzoampil said.

Read more comments on GitHub >

github_iconTop Results From Across the Web

An Overview of Multi-Task Learning in Deep Neural Networks
This blog post gives an overview of multi-task learning in deep neural networks. It discusses existing approaches as well as recent ...
Read more >
Multi-task learning - Wikipedia
Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks ... what is learned for each task can help...
Read more >
[2202.13914] Combining Modular Skills in Multitask Learning
Abstract: A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more ...
Read more >
Multitask learning: teach your AI more to make it better
Hi everyone! Today I want to tell you about the topic in machine learning that is, on one hand, very research oriented and...
Read more >
Applying multitask learning to AI models at LinkedIn
We support two different mechanisms for training: 1)iterative training where each task and its associated model will be trained separately in an ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found