Pytorch 1.5 DataParallel
See original GitHub issueš Bug
Information
Canāt run forward in PyTorch 1.5.0, works fine in 1.4.0
Model I am using (Bert, XLNet ā¦): XLNet
Language I am using the model on (English, Chinese ā¦): English
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
Transformer + custom head + custom losses + differential learning rates, I donāt think it matters.
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
Custom news classification
To reproduce
Steps to reproduce the behavior:
- Install PyTorch 1.5.0
- Run forward on xlnet
File "transformers/modeling_xlnet.py", line 761, in forward
dtype_float = next(self.parameters()).dtype
StopIteration
Expected behavior
Runs forward
Environment info
transformers
version: 2.8.0- Platform: Ubuntu 18.04
- Python version: Anaconda 3.7
- PyTorch version (GPU?): 1.5, Yes
- Tensorflow version (GPU?): N/A
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: Yes
Issue Analytics
- State:
- Created 3 years ago
- Reactions:19
- Comments:26 (8 by maintainers)
Top Results From Across the Web
Optional: Data Parallelism ā PyTorch Tutorials 1.13.0+cu117 ...
DataParallel splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel collectsĀ ......
Read more >Source code for torch_xla.distributed.data_parallel - PyTorch
[docs]class DataParallel(object): """Enable the execution of a model network in replicated mode using threads. Args: network (:class:`torch.nn.
Read more >DataParallel ā PyTorch 1.13 documentation
Implements data parallelism at the module level. This container parallelizes the application of the given module by splitting the input across the specifiedĀ ......
Read more >How is it possible to move a model wrapped in DataParallel to ...
How can I convert a model thats been trained on multiple GPUs (wrapped in DataParallel) to cpu? ... and by the way this...
Read more >Performance Tuning Guide - PyTorch
PyTorch 1.5 introduced support for channels_last memory format for convolutional networks. ... PyTorch has two ways to implement data-parallel training:.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Same problem here, running BERT.
Iām running on GPUs, using
export CUDA_VISIBLE_DEVICES=5,6,7
before running (I have 8 1080TIs on this server).run_language_modeling.py --output_dir=models --model_type=bert --model_name_or_path=bert-base-uncased --do_train --train_data_file=Vol45.sample --mlm --save_steps-2000 --line_by_line --per_gpu_train_batch_size=8
Vol45.sample is a .txt with one doc per line
EDIT: It seems to work if I downgrade pytorch to 1.4
Just to scope this bug a little bit better, all of you are using
torch.nn.DataParallel
(notDistributedDataParallel
or single-GPU), correct?