Multilingual RoBERTa?
See original GitHub issueThanks for the easy to understand paper and repo. This issue is more of a question wrt paper than the repo. The convenient thing about BERT (from an industry perspective) was the availability multilingual
model and tokenizer.
I was hoping if there are plans to have a multilingual RoBERTa
?
And a few followup (not important) questions are
- Wikipedia might be available in 100s of languages about what about CC-News?
- Will you replicate exactly the dataset for all languages?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:7
- Comments:8 (5 by maintainers)
Top Results From Across the Web
xlm-roberta-large - Hugging Face
XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages.
Read more >XLM-RoBERTa: The alternative for non-english NLP - Medium
Why multilingual models? XLM-Roberta comes at a time when there is a proliferation of non-English models such as Finnish BERT, ...
Read more >Multilingual roBERTa - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from Contradictory, My Dear Watson.
Read more >Larger-Scale Transformers for Multilingual Masked Language ...
Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more ......
Read more >Multilingual-Models — Sentence-Transformers documentation
The issue with multilingual BERT (mBERT) as well as with XLM-RoBERTa is that those produce rather bad sentence representation out-of-the-box.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@josecannete We have it under plans, there’s no fixed date of release though.
Here you go: https://github.com/pytorch/fairseq/tree/master/examples/xlmr