Runtime error when running Roberta with multiple GPUs on CommonSenseQA
See original GitHub issueWhen I run Roberta with multiple GPUs on CommonSenseQA, I encounter a problem of runtime time. Does anyone encounter the same problem ? Thanks.
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/usr/local/lib/python3.5/dist-packages/fairseq_cli/train.py", line 284, in distributed_main
main(args, init_distributed=True)
File "/usr/local/lib/python3.5/dist-packages/fairseq_cli/train.py", line 80, in main
train(args, trainer, task, epoch_itr)
File "/usr/local/lib/python3.5/dist-packages/fairseq_cli/train.py", line 121, in train
log_output = trainer.train_step(samples)
File "/usr/local/lib/python3.5/dist-packages/fairseq/trainer.py", line 287, in train_step
raise e
File "/usr/local/lib/python3.5/dist-packages/fairseq/trainer.py", line 264, in train_step
ignore_grad
File "/usr/local/lib/python3.5/dist-packages/fairseq/tasks/fairseq_task.py", line 230, in train_step
loss, sample_size, logging_output = criterion(model, sample)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/fairseq/criterions/sentence_ranking.py", line 49, in forward
classification_head_name='sentence_classification_head',
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/distributed.py", line 459, in forward
self.reducer.prepare_for_backward([])
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /pytorch/torch/csrc/distributed/c10d/reducer.cpp:518)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f414b2e0273 in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::prepare_for_backward(std::vector<torch::autograd::Variable, std::allocator<torch::autograd::Variable> > const&) + 0x734 (0x7f414c573ac4 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #2: <unknown function> + 0x691b2c (0x7f414c562b2c in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0x1d3f04 (0x7f414c0a4f04 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch_python.so)
frame #4: PyCFunction_Call + 0x77 (0x4ea137 in /usr/bin/python3)
frame #5: PyEval_EvalFrameEx + 0x59f6 (0x53c176 in /usr/bin/python3)
frame #6: PyEval_EvalCodeEx + 0x13b (0x540b0b in /usr/bin/python3)
frame #7: /usr/bin/python3() [0x4ec3f7]
frame #8: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #9: PyEval_EvalFrameEx + 0x252b (0x538cab in /usr/bin/python3)
frame #10: PyEval_EvalCodeEx + 0x13b (0x540b0b in /usr/bin/python3)
frame #11: /usr/bin/python3() [0x4ec3f7]
frame #12: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #13: /usr/bin/python3() [0x4fbfce]
frame #14: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #15: /usr/bin/python3() [0x574db6]
frame #16: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #17: PyEval_EvalFrameEx + 0x252b (0x538cab in /usr/bin/python3)
frame #18: PyEval_EvalCodeEx + 0x13b (0x540b0b in /usr/bin/python3)
frame #19: /usr/bin/python3() [0x4ec3f7]
frame #20: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #21: PyEval_EvalFrameEx + 0x252b (0x538cab in /usr/bin/python3)
frame #22: PyEval_EvalCodeEx + 0x13b (0x540b0b in /usr/bin/python3)
frame #23: /usr/bin/python3() [0x4ec2e3]
frame #24: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #25: /usr/bin/python3() [0x4fbfce]
frame #26: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #27: /usr/bin/python3() [0x574db6]
frame #28: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #29: PyEval_EvalFrameEx + 0x4ed6 (0x53b656 in /usr/bin/python3)
frame #30: /usr/bin/python3() [0x53fc97]
frame #31: PyEval_EvalFrameEx + 0x50bf (0x53b83f in /usr/bin/python3)
frame #32: /usr/bin/python3() [0x5401ef]
frame #33: PyEval_EvalFrameEx + 0x50bf (0x53b83f in /usr/bin/python3)
frame #34: PyEval_EvalFrameEx + 0x4b14 (0x53b294 in /usr/bin/python3)
frame #35: /usr/bin/python3() [0x53fc97]
frame #36: PyEval_EvalFrameEx + 0x50bf (0x53b83f in /usr/bin/python3)
frame #37: PyEval_EvalCodeEx + 0x13b (0x540b0b in /usr/bin/python3)
frame #38: /usr/bin/python3() [0x4ec358]
frame #39: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #40: PyEval_EvalFrameEx + 0x252b (0x538cab in /usr/bin/python3)
frame #41: PyEval_EvalCodeEx + 0x13b (0x540b0b in /usr/bin/python3)
frame #42: /usr/bin/python3() [0x4ec3f7]
frame #43: PyObject_Call + 0x47 (0x5c20e7 in /usr/bin/python3)
frame #44: PyEval_EvalFrameEx + 0x252b (0x538cab in /usr/bin/python3)
frame #45: PyEval_EvalFrameEx + 0x4b14 (0x53b294 in /usr/bin/python3)
frame #46: PyEval_EvalFrameEx + 0x4b14 (0x53b294 in /usr/bin/python3)
frame #47: PyEval_EvalFrameEx + 0x4b14 (0x53b294 in /usr/bin/python3)
frame #48: /usr/bin/python3() [0x53fc97]
frame #49: PyEval_EvalFrameEx + 0x50bf (0x53b83f in /usr/bin/python3)
frame #50: /usr/bin/python3() [0x53fc97]
frame #51: PyEval_EvalCode + 0x1f (0x5409bf in /usr/bin/python3)
frame #52: PyRun_StringFlags + 0x8f (0x52084f in /usr/bin/python3)
frame #53: PyRun_SimpleStringFlags + 0x3c (0x60f15c in /usr/bin/python3)
frame #54: Py_Main + 0x581 (0x640381 in /usr/bin/python3)
frame #55: main + 0xe1 (0x4d0001 in /usr/bin/python3)
frame #56: __libc_start_main + 0xf0 (0x7f41513c6830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #57: _start + 0x29 (0x5d6999 in /usr/bin/python3)
/usr/lib/python3.5/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 3 leaked semaphores to clean up at shutdown
len(cache))
Here is my fine tune scripts:
MAX_UPDATES=3000 # Number of training steps.
WARMUP_UPDATES=150 # Linearly increase LR over this many steps.
LR=1e-05 # Peak LR for polynomial LR scheduler.
MAX_SENTENCES=16 # Batch size.
SEED=1 # Random seed.
ROBERTA_PATH=roberta_pretrain_model/robeta.large/model.pt
DATA_DIR=raw_data/dataset_created_by_ola
# we use the --user-dir option to load the task from
# the examples/roberta/commonsense_qa directory:
FAIRSEQ_PATH=fairseq/
FAIRSEQ_USER_DIR=${FAIRSEQ_PATH}/examples/roberta/commonsense_qa
CUDA_VISIBLE_DEVICES=0,1,2 fairseq-train --fp16 \
$DATA_DIR \
--user-dir $FAIRSEQ_USER_DIR \
--restore-file $ROBERTA_PATH \
--reset-optimizer --reset-dataloader --reset-meters \
--no-epoch-checkpoints --no-last-checkpoints --no-save-optimizer-state \
--best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
--task commonsense_qa --init-token 0 --bpe gpt2 \
--arch roberta_large --max-positions 512 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--criterion sentence_ranking --num-classes 5 \
--optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \
--lr-scheduler polynomial_decay --lr $LR \
--warmup-updates $WARMUP_UPDATES --total-num-update $MAX_UPDATES \
--max-sentences $MAX_SENTENCES \
--max-update $MAX_UPDATES \
--log-format simple --log-interval 25 \
--seed $SEED
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (3 by maintainers)
Top Results From Across the Web
Runtime error when running RoBERTa with multiple GPUs on ...
This is the script what I used to running RoBERTa with 2 GPUs on CommonsenseQA. MAX_UPDATES=3000 # Number of training steps. WARMUP_UPDATES=150 #...
Read more >Runtime error - Google Groups
Running the H2O-DFT-LS benchmark on 4 GPUs ( 4 MPI tasks per GPU), I get: CELL_REF| Volume [angstrom^3]: 25825.145
Read more >Finetuning RoBERTa on Commonsense QA - Hugging Face
The above command assumes training on 1 GPU with 32GB of RAM. For GPUs with less memory, decrease --batch-size and increase --update-freq ...
Read more >allennlp-models - PyPI
Using Docker. Docker provides a virtual machine with everything set up to run AllenNLP-- whether you will leverage a GPU or just run...
Read more >Generative Data Augmentation for Commonsense Reasoning
In Tables 9 and. 10, we specify the input formats for finetuning GPT-. 2 and ROBERTA. Finally, we benchmark the run- ning time...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry, I’m new to Roberta. How to remove all the unused model elements ?
It happens when your model has unused parameters. Just comment them out!