question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn" when using my loss-function

See original GitHub issue
  • PyTorch-Forecasting version: 0.9.0
  • PyTorch version: 1.9.0
  • Python version: 3.6
  • Operating System: Windows10

Expected behavior

Hello!Appreciate for your brilliant work! When I using the Temporal_Fusion_Transformer I refer to the class “QuantileLoss(MultiHorizonMetric)” and modify the loss function to expect the model prediction results to be more accurate.

Actual behavior

tft = TemporalFusionTransformer.from_dataset(
    training,
    learning_rate=0.03,
    hidden_size=16,
    attention_head_size=1,
    dropout=0.1,
    hidden_continuous_size=8,
    output_size=7,  # 7 quantiles by default
    loss=MyLoss(),
    log_interval=10,  # uncomment for learning rate finder and otherwise, e.g. to 10 for logging every 10 batches
    reduce_on_plateau_patience=4,
)

However, the error is: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Code to reproduce the problem

The define of MyLoss() is:

class MyLoss(MultiHorizonMetric):
    def __init__(
            self,
            quantiles: List[float] = [0.02, 0.1, 0.25, 0.5, 0.75, 0.9, 0.98],
            **kwargs,
    ):
        super().__init__(quantiles=quantiles, **kwargs)

    def loss(self, y_pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
        # calculate quantile loss
        diff_2 = torch.zeros_like(target)
        for n in range(target.size(0)):
            for m in range(target.size(1)):
                if m == 0 or m == (target.size(1) - 1):
                    diff_2[n][m] = 0
                else:
                    diff_2[n][m] = torch.abs(target[n][m - 1] - 2 * target[n][m] + target[n][m + 1])

        losses = []
        for i, q in enumerate(self.quantiles):
            mae = torch.abs(y_pred[..., i] - target) / target.size(1)
            rmse = torch.sqrt(torch.pow(y_pred[..., i] - target, 2)) / target.size(1)

            loss = q * rmse + (1 - q) * mae + 0.2 * torch.pow(diff_2, 2)
            # loss = q * rmse + (1 - q) * mae

            losses.append(loss.unsqueeze(-1))

        losses = torch.cat(losses, dim=2)
        return losses

    def to_prediction(self, y_pred: torch.Tensor) -> torch.Tensor:
        """
        Convert network prediction into a point prediction.

        Args:
            y_pred: prediction output of network

        Returns:
            torch.Tensor: point prediction
        """
        if y_pred.ndim == 3:
            idx = self.quantiles.index(0.5)
            y_pred = y_pred[..., idx]
        return y_pred

    def to_quantiles(self, y_pred: torch.Tensor) -> torch.Tensor:
        """
        Convert network prediction into a quantile prediction.

        Args:
            y_pred: prediction output of network

        Returns:
            torch.Tensor: prediction quantiles
        """
        return y_pred

And the error is:

File "F:/TimothyLiu/ICONIP 2021/TFI/trian_TFT.py", line 197, in <module>
   val_dataloaders=val_dataloader,
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 458, in fit
   self._run(model)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 756, in _run
   self.dispatch()
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 797, in dispatch
   self.accelerator.start_training(self)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 96, in start_training
   self.training_type_plugin.start_training(trainer)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 144, in start_training
   self._results = trainer.run_stage()
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 807, in run_stage
   return self.run_train()
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 869, in run_train
   self.train_loop.run_training_epoch()
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 499, in run_training_epoch
   batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 738, in run_training_batch
   self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 442, in optimizer_step
   using_lbfgs=is_lbfgs,
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\core\lightning.py", line 1403, in optimizer_step
   optimizer.step(closure=optimizer_closure)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\core\optimizer.py", line 214, in step
   self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\core\optimizer.py", line 134, in __optimizer_step
   trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 329, in optimizer_step
   self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 336, in run_optimizer_step
   self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 193, in optimizer_step
   optimizer.step(closure=lambda_closure, **kwargs)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\torch\optim\optimizer.py", line 88, in wrapper
   return func(*args, **kwargs)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_forecasting\optim.py", line 131, in step
   _ = closure()
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 733, in train_step_and_backward_closure
   split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 836, in training_step_and_backward
   self.backward(result, optimizer, opt_idx)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 870, in backward
   result.closure_loss, optimizer, opt_idx, should_accumulate, *args, **kwargs
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 309, in backward
   self.lightning_module, closure_loss, optimizer, optimizer_idx, should_accumulate, *args, **kwargs
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\plugins\precision\precision_plugin.py", line 79, in backward
   model.backward(closure_loss, optimizer, opt_idx)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\pytorch_lightning\core\lightning.py", line 1275, in backward
   loss.backward(*args, **kwargs)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\torch\_tensor.py", line 255, in backward
   torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
 File "D:\DeepLearning\Anaconda3\envs\pytorch-transformer\lib\site-packages\torch\autograd\__init__.py", line 149, in backward
   allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
russellbrookscommented, Oct 8, 2021

Are you using multiple GPUs? For what it’s worth – I’m also hitting this error, but only when using multiple GPUs and multiple targets.

I’ve ensured there’s no nulls in my dataset, values are normalized, low learning rate with clipped gradients to reduce instability. Here’s what I’m noticing:

Single target    + CPU           --> works
Multiple targets + CPU           --> works
Single target    + 1 GPU         --> works
Multiple targets + 1 GPU         --> works
Single target    + multiple GPUs --> works
Multiple targets + multiple GPUs --> broken
0reactions
jdb78commented, Mar 24, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: element 0 of variables does not require grad ...
hi, i have a problem here, i got a sequence of Variables which are the outputs of the bi-directional RNN, and i stacked...
Read more >
Pytorch RuntimeError: element 0 of tensors does not require ...
If you call .detach() on the prediction, that will delete the gradients. Since you are first getting indices from the model and then...
Read more >
element 0 of tensors does not require grad and does ... - GitHub
It may be caused by "with torch.no_grad()" which stops the tracking of gradients so that there is no gradient information passed to grad...
Read more >
Pytorch Autograd torch.autograd inDepth-Beginners - Kaggle
Receiving dL/dz, the gradient of the loss function with respect to z from above, ... RuntimeError: element 0 of tensors does not require...
Read more >
Lesson 13 RuntimeError does not have a grad_fn
When I run the Lesson 13 notebook on pytorch 0.4 with windows 10. Got error: RuntimeError: element 0 of variables does not require...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found