question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Meet a StopIteration when continue training infoxlm from xlmr

See original GitHub issue

I try to continue training a infoxlm from xlmr on my own dataset. After I initialize the conda environment and prepare the training data. I use the following bash to train, but it throws a StopIteration Error. The bash I used is here. python src-infoxlm/train.py ${MLM_DATA_DIR} \ --task infoxlm --criterion xlco \ --tlm_data ${TLM_DATA_DIR} \ --xlco_data ${XLCO_DATA_DIR} \ --arch infoxlm_base --sample-break-mode complete --tokens-per-sample 512 \ --optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-6 --clip-norm 1.0 \ --lr-scheduler polynomial_decay --lr 0.0002 --warmup-updates 10000 \ --total-num-update 200000 --max-update 200000 \ --dropout 0.0 --attention-dropout 0.0 --weight-decay 0.01 \ --max-sentences 8 --update-freq 8 \ --log-format simple --log-interval 1 --disable-validation \ --save-interval-updates 10000 --no-epoch-checkpoints \ --seed 1 \ --save-dir ${SAVE_DIR}/ \ --tensorboard-logdir ${SAVE_DIR}/tb-log \ --roberta-model-path $HOMEPATH/xlmr.base/model.pt \ --num-workers 4 --ddp-backend=c10d --distributed-no-spawn \ --xlco_layer 8 --xlco_queue_size 256 --xlco_lambda 1.0 \ --xlco_momentum constant,0.9999 --use_proj

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
SAI990323commented, Oct 14, 2022

Thanks for your reply!

I meet this problem again and remember what I did last time. The problem for me is that the number of data is not enough to fill the xlco_queue This happens only when I test the bash code.

0reactions
stvhuangcommented, Oct 15, 2022

Yes, I also solve this problem with using a larger size of training data. 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

StopIteration ERROR during training · Issue #214
I have geforce gtx 1080 8gb so i have tried to train network with 16 batch size. And run the training with python3...
Read more >
Meet a StopIteration when continue training infoxlm from xlmr
I try to continue training a infoxlm from xlmr on my own dataset. After I initialize the conda environment and prepare the training...
Read more >
INFOXLM: An Information-Theoretic Framework for Cross- ...
In this work, we present an information- theoretic framework that formulates cross- lingual language model pre-training as.
Read more >
Meet a StopIteration when continue training infoxlm from xlmr ...
I try to continue training a infoxlm from xlmr on my own dataset. After I initialize the conda environment and prepare the training...
Read more >
StopIteration Error occurs during training while running the ...
TL;DR Your args.epoch_iters is larger than the number of batches in loader_train . Python raises StopIteration error when you ask for more ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found