question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for pre-training the language model

See original GitHub issue

Is your feature request related to a problem? Please describe. In order to use the classifier on different languages / specific domains it would be useful to be able to pretrain the language model.

Describe the solution you’d like Calling .fit on a corpus (i.e.) no labels should train the language model.

model.fit(corpus)

Describe alternatives you’ve considered Use the original repo which doesn’t have a simple to use interface.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
benleetownsendcommented, Aug 2, 2018

@xuy2 This code is merged into master now.

1reaction
madisonmaycommented, Aug 7, 2018

lt means the latter – randomly choosing 512 contiguous tokens from an article. A random slice of text.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why Do Pretrained Language Models Help in Downstream ...
Abstract: Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task.
Read more >
Why Do Pretrained Language Models Help in ... - OpenReview
We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text — the...
Read more >
Training a causal language model from scratch - Hugging Face
In this chapter, we'll take a different approach and train a completely new model from scratch. This is a good approach to take...
Read more >
Pre-trained Language Models: Simplified | by Prakhar Ganesh
A model which trains only on the task-specific dataset needs to both understand the language and the task using a comparatively smaller dataset....
Read more >
lyeoni/pretraining-for-language-understanding: Pre-training of ...
A language model would be trained on a massive corpus, and then we can use it as a component in other models that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found