Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generate text without <unk> tokens

See original GitHub issue

Hello,

I’m trying to generate text that does not include <unk> tokens. When I run

python generate.py generate/data-bin/dummy --path \
    checkpoints/checkpoint_best.pt --batch-size 32 --beam 1 \
    --sampling --sampling-topk 10 --sampling-temperature 0.8 --nbest 1 \
    --replace-unk REPLACE_UNK

I get the following error

Traceback (most recent call last):
  File "generate.py", line 171, in <module>
    main(args)
  File "generate.py", line 25, in main
    '--replace-unk requires a raw text dataset (--raw-text)'
AssertionError: --replace-unk requires a raw text dataset (--raw-text)

But when adding the --raw-text argument, the model seems to infer that this is a translation task rather than a text generation task:

Traceback (most recent call last):
  File "generate.py", line 171, in <module>
    main(args)
  File "generate.py", line 34, in main
    task = tasks.setup_task(args)
  File "/home/edb2129/fairseq/fairseq/tasks/__init__.py", line 19, in setup_task
    return TASK_REGISTRY[args.task].setup_task(args)
  File "/home/edb2129/fairseq/fairseq/tasks/translation.py", line 83, in setup_task
    raise Exception('Could not infer language pair, please provide it explicitly')
Exception: Could not infer language pair, please provide it explicitly

Is there a way to generate text from writing prompts without <unk> tokens?

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

Crista23commented, Apr 19, 2021

@bhardwaj1230 I have the same problem as you, did you find a fix for it? Thanks!

1reaction

bhardwaj1230commented, Sep 18, 2019

Hello,

I am facing the same issue, when I use the argument “–raw-text” it says “FileNotFoundError: Dataset not found: test”, but I have required files in the folder : dict.en.txt dict.fr.txt test.en-fr.en.bin test.en-fr.en.idx test.en-fr.fr.bin test.en-fr.fr.idx.