Generate text without <unk> tokens
See original GitHub issueHello,
I’m trying to generate text that does not include <unk> tokens. When I run
python generate.py generate/data-bin/dummy --path \
checkpoints/checkpoint_best.pt --batch-size 32 --beam 1 \
--sampling --sampling-topk 10 --sampling-temperature 0.8 --nbest 1 \
--replace-unk REPLACE_UNK
I get the following error
Traceback (most recent call last):
File "generate.py", line 171, in <module>
main(args)
File "generate.py", line 25, in main
'--replace-unk requires a raw text dataset (--raw-text)'
AssertionError: --replace-unk requires a raw text dataset (--raw-text)
But when adding the --raw-text
argument, the model seems to infer that this is a translation task rather than a text generation task:
Traceback (most recent call last):
File "generate.py", line 171, in <module>
main(args)
File "generate.py", line 34, in main
task = tasks.setup_task(args)
File "/home/edb2129/fairseq/fairseq/tasks/__init__.py", line 19, in setup_task
return TASK_REGISTRY[args.task].setup_task(args)
File "/home/edb2129/fairseq/fairseq/tasks/translation.py", line 83, in setup_task
raise Exception('Could not infer language pair, please provide it explicitly')
Exception: Could not infer language pair, please provide it explicitly
Is there a way to generate text from writing prompts without <unk> tokens?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Generate text without <unk> tokens · Issue #481 - GitHub
Hello, I'm trying to generate text that does not include tokens. ... there a way to generate text from writing prompts without <unk>...
Read more >How to handle <UKN> tokens in text generation - Stack Overflow
What should it be outputting instead of the <unk> ? I don't want to build a generator that outputs words it does not...
Read more >No <unk> token in the dataset but <unk> is generated in the ...
I use BPE to have no <unk> token in my dataset. Trained a model using OpenNMT-py with default parameters. Surprisingly, running translate.py ...
Read more >machine learning - Do we really need <unk> tokens?
The <unk> tags can simply be used to tell the model that there is stuff, which is not semantically important to the output....
Read more >Tokenizers - Hugging Face Course
Tokenizers are one of the core components of the NLP pipeline. They serve one purpose: to translate text into data that can be...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@bhardwaj1230 I have the same problem as you, did you find a fix for it? Thanks!
Hello,
I am facing the same issue, when I use the argument “–raw-text” it says “FileNotFoundError: Dataset not found: test”, but I have required files in the folder : dict.en.txt dict.fr.txt test.en-fr.en.bin test.en-fr.en.idx test.en-fr.fr.bin test.en-fr.fr.idx.