question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fairseq-preprocess does not work while training custom model

See original GitHub issue

🐛 Bug

Following the tutorial available at https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md to train a custom model using Roberta. Gets stuck with at the preprocess step with a few errors.

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Follow the exact steps in the tutorial Running this command throws the error.

fairseq-preprocess --only-source --srcdict gpt2_bpe/dict.txt --trainpref wikitext-103-raw/wiki.train.bpe --validpref wikitext-103-raw/wiki.valid.bpe --testpref wikitext-103-raw/wiki.test.bpe --destdir data-bin/wikitext-103 --workers 60

Stack trace:

Traceback (most recent call last): File “/usr/local/bin/fairseq-preprocess”, line 33, in <module> sys.exit(load_entry_point(‘fairseq’, ‘console_scripts’, ‘fairseq-preprocess’)()) File “/usr/local/bin/fairseq-preprocess”, line 25, in importlib_load_entry_point return next(matches).load() File “/usr/lib/python3.8/importlib/metadata.py”, line 77, in load module = import_module(match.group(‘module’)) File “/usr/lib/python3.8/importlib/init.py”, line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “<frozen importlib._bootstrap>”, line 1014, in _gcd_import File “<frozen importlib._bootstrap>”, line 991, in _find_and_load File “<frozen importlib._bootstrap>”, line 975, in _find_and_load_unlocked File “<frozen importlib._bootstrap>”, line 671, in _load_unlocked File “<frozen importlib._bootstrap_external>”, line 783, in exec_module File “<frozen importlib._bootstrap>”, line 219, in _call_with_frames_removed File “/home/adutta/Documents/git_repo/fairseq/fairseq_cli/preprocess.py”, line 18, in <module> from fairseq import options, tasks, utils File “/home/adutta/Documents/git_repo/fairseq/fairseq/init.py”, line 32, in <module> import fairseq.criterions # noqa File “/home/adutta/Documents/git_repo/fairseq/fairseq/criterions/init.py”, line 36, in <module> importlib.import_module(“fairseq.criterions.” + file_name) File “/usr/lib/python3.8/importlib/init.py”, line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “/home/adutta/Documents/git_repo/fairseq/fairseq/criterions/label_smoothed_cross_entropy_latency_augmented.py”, line 6, in <module> from examples.simultaneous_translation.utils.latency import LatencyTraining File “/home/adutta/Documents/git_repo/fairseq/examples/simultaneous_translation/init.py”, line 6, in <module> from . import criterions, eval, models # noqa File “/home/adutta/Documents/git_repo/fairseq/examples/simultaneous_translation/models/init.py”, line 13, in <module> importlib.import_module( File “/usr/lib/python3.8/importlib/init.py”, line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “/home/adutta/Documents/git_repo/fairseq/examples/simultaneous_translation/models/transformer_monotonic_attention.py”, line 13, in <module> from fairseq.models import ( File “/home/adutta/Documents/git_repo/fairseq/fairseq/models/init.py”, line 208, in <module> module = importlib.import_module(“fairseq.models.” + model_name) File “/usr/lib/python3.8/importlib/init.py”, line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “/home/adutta/Documents/git_repo/fairseq/fairseq/models/speech_to_text/init.py”, line 8, in <module> from .convtransformer_simul_trans import * # noqa File “/home/adutta/Documents/git_repo/fairseq/fairseq/models/speech_to_text/convtransformer_simul_trans.py”, line 8, in <module> from examples.simultaneous_translation.models.transformer_monotonic_attention import ( ImportError: cannot import name ‘TransformerMonotonicDecoder’ from partially initialized module ‘examples.simultaneous_translation.models.transformer_monotonic_attention’ (most likely due to a circular import) (/home/adutta/Documents/git_repo/fairseq/examples/simultaneous_translation/models/transformer_monotonic_attention.py)

Environment

  • fairseq Version (e.g., 1.0 or master): master (latest pull from Github)
  • PyTorch Version (e.g., 1.0): 1.7.1
  • OS (e.g., Linux): Ubuntu 20.04
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): git clone https://github.com/pytorch/fairseq; cd fairseq; pip3 install --editable ./
  • Python version: 3.8
  • CUDA/cuDNN version: 11.2
  • GPU models and configuration: GeForce RTX 2080

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
olafthielecommented, Feb 21, 2021

Currently I revert to a commit before the file was changed and rebuild. Not good, but it works:

git checkout da9eaba12d82b9bfc1442f0e2c6fc1b895f4d35d
pip install --editable ./
1reaction
olafthielecommented, Feb 24, 2021

Error is gone for me with current master.

Read more comments on GitHub >

github_iconTop Results From Across the Web

fairseq-preprocess does not work while training custom model
Running this command throws the error. fairseq-preprocess --only-source --srcdict gpt2_bpe/dict.txt --trainpref wikitext-103-raw/wiki.train.bpe ...
Read more >
Command-line Tools — fairseq 0.12.2 documentation
Fairseq provides several command-line tools for training and evaluating models: fairseq-preprocess: Data pre-processing: build vocabularies and binarize ...
Read more >
Guidance on using FAIRseq for seq2seq tasks - Google Groups
Hi, I want to use FairSeq for a custom seq2seq task but had a few doubts using it:- What is the format required...
Read more >
fairseq Users | Hi, | Facebook
Hi, I'm using a big model from fairseq. And I'm experimenting using pre-trained embedding with fairseq. But when i start training with embeddings......
Read more >
fairseq/examples/translation/README.md - Hugging Face
Training a new model. IWSLT'14 German to English (Transformer). The following instructions can be used to train ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found