Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Convert 12-1 and 6-1 en-de models from AllenNLP

See original GitHub issue

https://github.com/jungokasai/deep-shallow#download-trained-deep-shallow-models

These should be FSMT models, so can be part of #6940 or done after.
They should be uploaded to the AllenNLP namespace. If stas takes this, they can start in stas/ and I will move them.
model card(s) should link to the original repo and paper.
I hope same en-de tokenizer already ported.
Would be interesting to compare BLEU to the initial models in that PR. There is no ensemble so we should be able to reported scores pretty well.
Ideally this requires 0 lines of checked in python code, besides maybe an integration test.

Desired Signature:

model = FSMT.from_pretrained('allen_nlp/en-de-12-1')

Weights can be downloaded with gdown https://pypi.org/project/gdown/

pip install gdown
gdown https://drive.google.com/uc?id=1x_G2cjvM1nW5hjAB8-vWxRqtQTlmIaQU

@stas00 if you are blocked in the late stages of #6940 and have extra cycles, you could give this a whirl. We could also wait for that to be finalized and then either of us can take this.

Issue Analytics

State:
Created 3 years ago
Comments:19 (18 by maintainers)

Top GitHub Comments

1reaction

sshleifercommented, Sep 11, 2020

New model’s config should be {'num_beams':5} according to https://github.com/jungokasai/deep-shallow#evaluation

1reaction

sshleifercommented, Sep 10, 2020

since different dataset, different val set, which implies that 28 and 41 are not comparable BLEU scores.
These models should be significantly faster than the Marian models at similar performance levels.
We can finetune them on the new data if we think that will help.
FYI Helinki-NLP/opus-mt-en-de trained on way more data than the fairseq model I think, not totally sure.

Top Results From Across the Web

Writing Code for NLP Research

This is not a tutorial about AllenNLP. ○ But (obviously, seeing as we wrote it). AllenNLP represents our experiences and opinions about how...

Learning From Instructions

Models equipped with "understanding" language instructions, should successfully solve any unseen task, if they are provided with the task instructions.

A pipeline for large raw text preprocessing and model training ...

Ideally, we would like to have an end-to-end pipeline, from data collection to model evaluation, although the scope of the project is limited...

Converting the Point of View of Messages Spoken to Virtual ...

The first step of our POV conversion model, therefore, ... Figure 1: The end-to-end architecture of the rule-based model. Message Classification.

Vision Transformer (ViT): Tutorial + Baseline - Kaggle

I'll briefly introduce you to how transformers work before getting into the details of the topic at hand - ViT. ViT-Illustration. If you...