question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add Support for "No Language Left Behind" (NLLB)

See original GitHub issue

Model description

Hi,

Meta recently released another cool project called “No Language Left Behind” (NLLB):

No Language Left Behind (NLLB) is a first-of-its-kind, AI breakthrough project that open-sources models capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more. It aims to help people communicate with anyone, anywhere, regardless of their language preferences.

The project itself is integrated into fairseq library and available on the nllb branch:

https://github.com/facebookresearch/fairseq/tree/nllb

It includes code release as well as released checkpoints.

A detailed 190 page paper is also available from here.

We should really add support for these amazing project by adding support for NLLB.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Models checkpoint are available here:

Model Name Model Type #params checkpoint metrics
NLLB-200 MoE 54.5B model metrics
NLLB-200 Dense 3.3B model metrics
NLLB-200 Dense 1.3B model metrics
NLLB-200-Distilled Dense 1.3B model metrics
NLLB-200-Distilled Dense 600M model metrics

Maintainers are: @vedanuj, @shruti-bh, @annasun28, @elbayadm, @jeanm, @jhcross, @kauterry and @huihuifan.

Implementation is available in the fairseq repo: https://github.com/facebookresearch/fairseq/tree/nllb

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:22
  • Comments:17 (1 by maintainers)

github_iconTop GitHub Comments

13reactions
jhcrosscommented, Jul 6, 2022

Hi, I’m one of the Meta engineers who worked on NLLB, and I’m happy to support this from our side. That’s indeed the correct (real) SPM model for the vocabulary used for input/output, but internally the model’s vocabulary (and embedding table) size is supplemented at the end by a token for each language, which happens here:

https://github.com/facebookresearch/fairseq/blob/26d62ae8fbf3deccf01a138d704be1e5c346ca9a/fairseq/data/multilingual/multilingual_utils.py#L64

This list of languages come from an input arg which reads them from a string or file. For these particular models that value is:

https://github.com/facebookresearch/fairseq/blob/26d62ae8fbf3deccf01a138d704be1e5c346ca9a/examples/nllb/modeling/scripts/flores200/langs.txt#L1

Please let me know if you have any questions about this or if I can be of any further help.

9reactions
LysandreJikcommented, Jul 13, 2022

Thanks for opening an issue! We’ve managed to convert the models to the M2M_100 architecture and the tokenizers to a new NLLB tokenizer very closely resembling that of the mBART tokenizer.

We’re in the process of testing all models for generation and performance and I’ll likely open a PR in a few hours.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No Language Left Behind - Meta AI Research Topic
Our work aims to break down language barriers across the world for everyone to understand and communicate with anyone—no matter what language they...
Read more >
Meta AI: No Language Left Behind
We've created a demo that uses the latest AI advancements from the No Language Left Behind project to translate books from their languages...
Read more >
No Language Left Behind (NLLB) - Zeta Alpha
No Language Left Behind (NLLB). 200 languages within a single AI model: A breakthrough in high-quality machine translation for low resource languages.
Read more >
Language Translation Using Meta AI NLLB ... - Cobus Greyling
Language Translation Using Meta AI NLLB (No Language Left Behind) And SMS. The Meta AI NLLB project has open-sourced models, capable of performing...
Read more >
NLLB - Hugging Face
In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found