Transformer CUDA kernel backward tests fails with Pytorch 1.8+cu11
See original GitHub issueWith Pytorch 1.8 and CUDA 11.0, I received the following error msg for test_cuda_backward.py:
def __init__(self, config, initial_weights=None, initial_biases=None):
super(DeepSpeedTransformerLayer, self).__init__()
self.config = config
self.config.layer_id = DeepSpeedTransformerLayer.layer_id
DeepSpeedTransformerLayer.layer_id = DeepSpeedTransformerLayer.layer_id + 1
print("DeepSpeed Transformer config is ", self.config.__dict__)
if self.config.local_rank >= 0:
torch.cuda.set_device(self.config.local_rank)
if initial_weights is None and initial_biases is None:
self.attn_qkvw = nn.Parameter(
torch.Tensor(self.config.hidden_size * 3,
self.config.hidden_size))
self.attn_qkvb = nn.Parameter(torch.Tensor(self.config.hidden_size * 3))
self.attn_ow = nn.Parameter(
torch.Tensor(self.config.hidden_size,
self.config.hidden_size))
self.attn_ob = nn.Parameter(torch.Tensor(self.config.hidden_size))
self.attn_nw = nn.Parameter(torch.Tensor(self.config.hidden_size))
self.attn_nb = nn.Parameter(torch.Tensor(self.config.hidden_size))
self.inter_w = nn.Parameter(
torch.Tensor(self.config.intermediate_size,
self.config.hidden_size))
self.inter_b = nn.Parameter(torch.Tensor(self.config.intermediate_size))
self.output_w = nn.Parameter(
torch.Tensor(self.config.hidden_size,
self.config.intermediate_size))
self.output_b = nn.Parameter(torch.Tensor(self.config.hidden_size))
self.norm_w = nn.Parameter(torch.Tensor(self.config.hidden_size))
self.norm_b = nn.Parameter(torch.Tensor(self.config.hidden_size))
self.init_transformer_weights(self.config.adjust_init_range)
else:
# For testing only.
self.attn_qkvw = nn.Parameter(
torch.Tensor(self.config.hidden_size * 3,
self.config.hidden_size))
for i in range(3):
self.attn_qkvw[i * self.config.hidden_size:(i + 1) * self.config.hidden_size] = \
> torch.empty_like(initial_weights[i]).copy_(initial_weights[i])
E RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
/usr/local/lib64/python3.7/site-packages/deepspeed/ops/transformer/transformer.py:526: RuntimeError
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (10 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @szhengac
Sorry for the delay in responding. I am wrapping up some other work, and I will get back to this soon. Yes, I am pretty sure it is solvable.
Thanks, Reza
@RezaYazdaniAminabadi Is there any update on the fix for Transformer CUDA kernel?