Adding Universal Language Model Fine-tuning ULMFiT pre-trained LM to spacy and alowing a simple way to train new models
See original GitHub issueFeature description
Universal Language Model Fine-tuning for Text Classification presented a novel method to fine tune a pre-trained universal language model to a particular classification task which achieved beyond state-of-the art (18-24% reduction in error rate) on multiple benchmark text classification tasks. The fine tuning requires very few examples (100) to achieve very good results.
Here is an excerpt of the abstract which provides a good TL;DR of the paper (duh):
Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18- 24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100× more data. We opensource our pretrained models and code
I propose that spacy adds their pre-trained models and a simple way to fine tune to a new task as a core feature of the library.
Could the feature be a custom component or spaCy plugin?
If so, we will tag it as project idea
so other users can take it on.
This seems like a core feature of spacy, greatly increasing its industrial potential. I would argue to make it a first class citizen if authors and licensing of this work permits that.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:46
- Comments:10 (3 by maintainers)
Author here. I’d love to see this happen and I’m sure @jph00 would also be on board. Fast.ai is working on pre-trained models for other languages and we’ll be working to simplify and make the code more robust.
Super keen on this! @jph00 the vision for plugging in other libraries is to have Thinc as a thin wrapper on top. I’ve just merged a PR on this, and have fixed up an example of wrapping a BiLSTM model and inserting it into a Thinc model: https://github.com/explosion/thinc/blob/master/examples/pytorch_lstm_tagger.py#L122
You can find the wrapper here: https://github.com/explosion/thinc/blob/master/thinc/extra/wrappers.py#L13
This wrapping approach is the long-standing plan for plugging “foreign” models into spaCy and Prodigy. We want to have similar wrappers for Tensorflow, DyNet, MXNet etc. The Thinc API is pretty minimal, so it’s easy to wrap this way.
Btw, as well as a plugin, I’m very interested in finding the right solution for pre-training the “embed” and “encode” steps in spaCy’s NER, parser, etc. The catch is that our performance target is 10k words per second per CPU core, which I think means we can’t use BiLSTM. The CNN architecture I’ve got is actually pretty good, and we’re currently only a little off the target (7.5k words per second in my latest tests).