question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Link for downloading the back translation code is not working

See original GitHub issue

While trying to run back_translate/download.sh, I get the following error:

> bash download.sh

--2021-06-19 12:36:11--  https://storage.googleapis.com/uda_model/text/back_trans_checkpoints.zip 
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.8.16, 172.217.9.208, 172.217.12.240, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.8.16|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2021-06-19 12:36:11 ERROR 404: Not Found.
unzip:  cannot find or open back_trans_checkpoints.zip, back_trans_checkpoints.zip.zip or back_trans_checkpoints.zip.ZIP.

It seems that the storage.googleapis.com/uda_model bucket is not valid anymore. Is there an alternate link that I can use to download the back_translate code?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

4reactions
Liu-Jingyaocommented, Jan 23, 2022

Maybe this could be of help, I made a small code to make the backtranslations with HuggingFace, although I have not tested the quality of the generated data, if they perform well with UDA, or the time it would take to translate the whole dataset, but visually they seem good. It works with transformers==4.4.2 and may require some modifications on newer versions.

import torch
from transformers import MarianMTModel, MarianTokenizer

torch.cuda.empty_cache()

en_fr_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
en_fr_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-fr").cuda()

fr_en_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
fr_en_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-fr-en").cuda()

src_text = [
    "Hi how are you?",
]

translated_tokens = en_fr_model.generate(
    **{k: v.cuda() for k, v in en_fr_tokenizer(src_text, return_tensors="pt", padding=True, max_length=512).items()},
    do_sample=True, 
    top_k=10, 
    temperature=2.0,
)
in_fr = [en_fr_tokenizer.decode(t, skip_special_tokens=True) for t in translated_tokens]

bt_tokens = fr_en_model.generate(
    **{k: v.cuda() for k, v in fr_en_tokenizer(in_fr, return_tensors="pt", padding=True, max_length=512).items()},
    do_sample=True, 
    top_k=10, 
    temperature=2.0,
)
in_en = [fr_en_tokenizer.decode(t, skip_special_tokens=True) for t in bt_tokens]

For the arguments used to generate please refer to https://huggingface.co/blog/how-to-generate.

Example of input data and backtranslation:

Input: I lived in Tokyo for 7 months. Knowing the reality of long train commutes, bike rides from the train station, soup stands, and other typical scenes depicted so well, certainly added to my own appreciation for this film which I really, really liked. There are aspects of Japanese life in this film painted with vivid colors but you don't have to speak Japanese to enjoy this movie. Director Suo's tricks were subtle for the most part; I found his highlighting the character called Tamako Tamura with a soft filter, making her sublime, a tiny bit contrived but most of the directors tricks were so gentle that I was fully pulled in and just danced with his characters. Or cried. Or laughed aloud. Wonderful. A+.
---
Output: I lived in Tokyo for seven months. I know the reality of train rides, bike rides from the train station, soup stands, and other typical scenes shown so nicely, probably added to my own appreciation of this film I really, really loved. There are aspects of Japanese life in this film painted with vivid colors but you don't have to speak Japanese to enjoy this movie. The pieces of the director Suo have been subtle to most, I found that he highlights the character called Tamaki Tamura with a sweet filter, which makes her sublime, a bit confused but most of the movie-makers' tricks were so soft that I was completely shot in it and just dancing with his characters. Or wept. or laughed aloud. Wonderful. A+.

Thanks! I’ll try it as a substitute for the source code.

2reactions
sebamenabarcommented, Sep 23, 2021

Maybe this could be of help, I made a small code to make the backtranslations with HuggingFace, although I have not tested the quality of the generated data, if they perform well with UDA, or the time it would take to translate the whole dataset, but visually they seem good. It works with transformers==4.4.2 and may require some modifications on newer versions.

import torch
from transformers import MarianMTModel, MarianTokenizer

torch.cuda.empty_cache()

en_fr_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
en_fr_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-fr").cuda()

fr_en_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
fr_en_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-fr-en").cuda()

src_text = [
    "Hi how are you?",
]

translated_tokens = en_fr_model.generate(
    **{k: v.cuda() for k, v in en_fr_tokenizer(src_text, return_tensors="pt", padding=True, max_length=512).items()},
    do_sample=True, 
    top_k=10, 
    temperature=2.0,
)
in_fr = [en_fr_tokenizer.decode(t, skip_special_tokens=True) for t in translated_tokens]

bt_tokens = fr_en_model.generate(
    **{k: v.cuda() for k, v in fr_en_tokenizer(in_fr, return_tensors="pt", padding=True, max_length=512).items()},
    do_sample=True, 
    top_k=10, 
    temperature=2.0,
)
in_en = [fr_en_tokenizer.decode(t, skip_special_tokens=True) for t in bt_tokens]

For the arguments used to generate please refer to https://huggingface.co/blog/how-to-generate.

Example of input data and backtranslation:

Input: I lived in Tokyo for 7 months. Knowing the reality of long train commutes, bike rides from the train station, soup stands, and other typical scenes depicted so well, certainly added to my own appreciation for this film which I really, really liked. There are aspects of Japanese life in this film painted with vivid colors but you don't have to speak Japanese to enjoy this movie. Director Suo's tricks were subtle for the most part; I found his highlighting the character called Tamako Tamura with a soft filter, making her sublime, a tiny bit contrived but most of the directors tricks were so gentle that I was fully pulled in and just danced with his characters. Or cried. Or laughed aloud. Wonderful. A+.
---
Output: I lived in Tokyo for seven months. I know the reality of train rides, bike rides from the train station, soup stands, and other typical scenes shown so nicely, probably added to my own appreciation of this film I really, really loved. There are aspects of Japanese life in this film painted with vivid colors but you don't have to speak Japanese to enjoy this movie. The pieces of the director Suo have been subtle to most, I found that he highlights the character called Tamaki Tamura with a sweet filter, which makes her sublime, a bit confused but most of the movie-makers' tricks were so soft that I was completely shot in it and just dancing with his characters. Or wept. or laughed aloud. Wonderful. A+.
Read more comments on GitHub >

github_iconTop Results From Across the Web

BackTranslation - PyPI
BackTranslation. version Downloads license. BackTranslation is a python library that implemented to back translate the words among any two languages.
Read more >
Data Augmentation in NLP Using Back Translation With ...
Getting an accurate model is not a straightforward path. ... Back translation: translate back each of those translated data into the original language, ......
Read more >
Translate text, voice, and conversations on iPhone
In Translate on iPhone, translate text, voice, and conversations between languages. Download specific languages for offline translations.
Read more >
Downloading translation files - Lokalise Docs
Learn how to download translation data to your PC and customize the process. ... Click the QA issues link to see what problems...
Read more >
[2102.07847] Meta Back-translation - arXiv
However, several recent works have found that better translation quality of the pseudo-parallel data does not necessarily lead to better final ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found