question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

--replace-unk causes bugs with fairseq-interactive

See original GitHub issue

🐛 Bug

When using farseq-interactive to generate translations, the --replace-unk argument causes several bugs.

  1. The alignments are given as tuple, but the function apparently just expects a list of indices of the aligned source token.
  2. When no alignment file is give, the standard input configuration ‘@@’ causes alignment file loader to break.
  3. At last, when the out-of-vocabulary (OOV) word in the hypothesis is also OOV in the source dictionary, then you still get an <unk> in your translation. So I think, it would be good that in this case the original input is used to replace the <unk> in the translation.

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

python fairseq-interactive.py fairseq-data-bin-10752
--path models/transformer_iwslt_de_en_10752-align/checkpoint_best.pt
--beam 5 --source-lang nl
--target-lang ql
--print-alignment --replace-unk
--tokenizer moses

input text: legal name of allianz Allianz is an OOV word for my task.

For 1.

Traceback (most recent call last):
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq_cli/interactive.py", line 318, in <module>
    cli_main()
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq_cli/interactive.py", line 314, in cli_main
    distributed_utils.call_main(convert_namespace_to_omegaconf(args), main)
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq_cli/interactive.py", line 267, in main
    hypo_tokens, hypo_str, alignment = utils.post_process_prediction(
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq/utils.py", line 246, in post_process_prediction
    hypo_str = replace_unk(hypo_str, src_str, alignment, align_dict,
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq/utils.py", line 222, in replace_unk
    src_token = src_tokens[alignment[i]]
TypeError: list indices must be integers or slices, not tuple

For 2.:

Traceback (most recent call last):
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq_cli/interactive.py", line 318, in <module>
    cli_main()
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq_cli/interactive.py", line 314, in cli_main
    distributed_utils.call_main(convert_namespace_to_omegaconf(args), main)
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq_cli/interactive.py", line 191, in main
    align_dict = utils.load_align_dict(cfg.generation.replace_unk)
  File "/Users/jan_marcglowienke/Documents/University/Master_Courses/Thesis/10_fairseq/fairseq/utils.py", line 164, in load_align_dict
    with open(replace_unk, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '@@ '

Expected behavior

Replace the <unk> in the hypothesis by the corresponding word in the input according to the alignments. This should also be possible without an alignment dictionary.

I made a fix for 1., found a workaround for 2. and added some code to include feature described in 3.

I can provide a PR, if wished

Environment

  • fairseq Version (e.g., 1.0 or master): master, ‘1.0.0a0+2429317’
  • PyTorch Version (e.g., 1.0): 1.8.1
  • OS (e.g., Linux): MacOS 11.2.3
  • How you installed fairseq (pip, source): CFLAGS="-stdlib=libc++" pip install --editable ./
  • Build command you used (if compiling from source):
  • Python version: 3.8.8
  • CUDA/cuDNN version: -
  • GPU models and configuration: -
  • Any other relevant information:

Additional context

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

1reaction
jm-glowienkecommented, Dec 4, 2022

Hi, I found a solution for the problems described in the issue. They can be found on my personal fork of fairseq: https://github.com/jm-glowienke/fairseq Unfortunately, I cannot help you any further, as I only worked on this for my thesis almost 2 years ago.

1reaction
xihajuncommented, Dec 4, 2022

Hi @jm-glowienke I would also like to know if there is any solution to this issue

Are you also applying for transformer model?

This blog explained a bit about why their -replace-unk is not working for the transformer model. https://forum.opennmt.net/t/translate-py-with-replace-unk-option-and-the-transformer-model/2646

might be helpful somehow

[Update on Dec 04, 2022] My task was doing spelling correction, and I was trying to skip all the special characters to unk. I used an alternative way to achieve that:

  • replace all the special characters eg, 0-9 to <unk> for paired data (maybe also works for names and other words)
  • train the model
  • replace them back in order
Read more comments on GitHub >

github_iconTop Results From Across the Web

replace-unk causes bugs with fairseq-interactive · Issue #3533
Bug When using farseq-interactive to generate translations, the --replace-unk argument causes several bugs. The alignments are given as ...
Read more >
How to use fairseq interactive.py non-interactively?
I am trying to translate from English to Arabic using Fairseq. But the interactive.py script translate pieces of text ...
Read more >
Add fairseq to PyPI (#495) (fbd4cef9) · Commits - gitlab
Summary: - fairseq can now be installed via pip: `pip install ... :ref:`fairseq-interactive`: Translate raw text with a trained model.
Read more >
Fairseq - Features, How to Use And Install, Github ... - Folio3.Ai
How to Install Fairseq – Interactive Installation Guide. There are a few simple steps to get started with fairseq. Follow the sequence: 1)...
Read more >
Similar language translation - UPCommons
line Fairseq-interactive to translate the test data of the source language. This will generate a document with the data translated.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found