Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multilingual RoBERTa?

See original GitHub issue

Thanks for the easy to understand paper and repo. This issue is more of a question wrt paper than the repo. The convenient thing about BERT (from an industry perspective) was the availability multilingual model and tokenizer.

I was hoping if there are plans to have a multilingual RoBERTa?

And a few followup (not important) questions are

Wikipedia might be available in 100s of languages about what about CC-News?
Will you replicate exactly the dataset for all languages?

Issue Analytics

State:
Created 4 years ago
Reactions:7
Comments:8 (5 by maintainers)

Top GitHub Comments

2reactions

ngoyal2707commented, Oct 16, 2019

@josecannete We have it under plans, there’s no fixed date of release though.

1reaction

ngoyal2707commented, Nov 11, 2019

Here you go: https://github.com/pytorch/fairseq/tree/master/examples/xlmr

Read more comments on GitHub >

Top Results From Across the Web

xlm-roberta-large - Hugging Face

XLM-RoBERTa is a multilingual version of RoBERTa. It is pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages.

XLM-RoBERTa: The alternative for non-english NLP - Medium

Why multilingual models? XLM-Roberta comes at a time when there is a proliferation of non-English models such as Finnish BERT, ...

Multilingual roBERTa - Kaggle

Explore and run machine learning code with Kaggle Notebooks | Using data from Contradictory, My Dear Watson.

Larger-Scale Transformers for Multilingual Masked Language ...

Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more ......

Multilingual-Models — Sentence-Transformers documentation

The issue with multilingual BERT (mBERT) as well as with XLM-RoBERTa is that those produce rather bad sentence representation out-of-the-box.

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

RoBERTa model problem

Is my training routine normal?