TypeError exception in AxialPositionalEncoding when using DataParallel
See original GitHub issueHello,
I want to run SinkhornTransformerLM using multiple GPUs, so I’m wrapping the model into torch.nn.DataParallel. However, when I do this, I get an exception:
Traceback (most recent call last):
File "script.py", line 27, in <module>
model(x)
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/sinkhorn_transformer/sinkhorn_transformer.py", line 792, in forward
x = self.axial_pos_emb(x) + x
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/sinkhorn_transformer/sinkhorn_transformer.py", line 243, in forward
return pos_emb[:, :t]
TypeError: 'int' object is not subscriptable
Looking at the code, it would seem that self.weights
does not get populated. To reproduce this error, I took the first example in README.md and changed
model(x) # (1, 2048, 20000)
to
model = torch.nn.DataParallel(model, device_ids=list(range(torch.cuda.device_count()))).to('cuda')
model(x)
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
I have a trouble after trying to parallelize data using nn ...
DataParallel(model)" the error message "TypeError: 'list' object is not callable" comes. If I push out that damn line the source works ...
Read more >Error in torch.nn.DataParallel - PyTorch Forums
Hi. I would like to use “DataParallel” in DNN training in Pytorch but get some errors. Before I use “DataParallel”, the code is;....
Read more >Data Parallel Troubleshooting - Amazon SageMaker
Using SageMaker Distributed Data Parallel with Amazon SageMaker Debugger and Checkpoints. To monitor system bottlenecks, profile framework operations, and debug ...
Read more >How to use multiple gpus - fastai dev - fast.ai Course Forums
Dataparallel(model), and then using this model in learner gives ... zip(*outputs))) TypeError: zip argument #1 must support iteration. Use ...
Read more >How to convert a PyTorch DataParallel project to use ...
We will also need the comm file, which gives some nice functionality for handling distribution resources. The wrapper code copied from ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@lucidrains, Looks like your fix got it to work! Thanks a bunch!
Cool! I’ll see if I can try it out. Thanks for the tip!
@kl0211 do share your results! this repository is still in the exploratory phase!