question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multilingual RoBERTa?

See original GitHub issue

Thanks for the easy to understand paper and repo. This issue is more of a question wrt paper than the repo. The convenient thing about BERT (from an industry perspective) was the availability multilingual model and tokenizer.

I was hoping if there are plans to have a multilingual RoBERTa?

And a few followup (not important) questions are

  1. Wikipedia might be available in 100s of languages about what about CC-News?
  2. Will you replicate exactly the dataset for all languages?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:7
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
ngoyal2707commented, Oct 16, 2019

@josecannete We have it under plans, there’s no fixed date of release though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

xlm-roberta-large - Hugging Face
XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages.
Read more >
XLM-RoBERTa: The alternative for non-english NLP - Medium
Why multilingual models? XLM-Roberta comes at a time when there is a proliferation of non-English models such as Finnish BERT, ...
Read more >
Multilingual roBERTa - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from Contradictory, My Dear Watson.
Read more >
Larger-Scale Transformers for Multilingual Masked Language ...
Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more ......
Read more >
Multilingual-Models — Sentence-Transformers documentation
The issue with multilingual BERT (mBERT) as well as with XLM-RoBERTa is that those produce rather bad sentence representation out-of-the-box.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found