question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Neptune log error for multiple dataloaders

See original GitHub issue

Describe the bug

Error gets thrown while logging the metric value. Having Pytorch lightning integration with Neptune. This error gets thrown only in the latest client of Neptune’s from neptune.new.integrations.pytorch_lightning import NeptuneLogger

Reproduction

https://colab.research.google.com/drive/13rRlztjGRQrv6Y3W-d21Dotoj8L2UtoZ?usp=sharing

Expected behavior

Experiment should keep running when without any error.

Traceback

Following trace as a result of invoking self.logger.log_metrics

    def __getattr__(self, attr):
>       raise AttributeError("{} has no attribute {}.".format(type(self), attr))
E       AttributeError: <class 'neptune.new.attributes.namespace.Namespace'> has no attribute log.
env/lib/python3.8/site-packages/neptune/new/attributes/attribute.py:35: AttributeError

image

If the value of attr is None, then it passes the if condition and am not facing any error. Facing the issue in the else condition. neptune.new.handler.Handler.log self._path = "val_loss"

Environment

The output of pip list:

neptune-contrib           0.27.2                   pypi_0    pypi
neptune-pytorch-lightning 0.9.7                    pypi_0    pypi

The operating system you’re using: Ubuntu The output of python --version: Python 3.8.10

Additional context It gets logged for all the metrics, only for this particular ‘val_loss’ key the error gets thrown. Happens only after migrating to new neptune client. Works fine with previous version. This error gets thrown only having more than one validation dataloader.

EDIT: If we have multiple dataloaders, then all of the parameters that gets logged will have name of the dataloader appended. Ex: Suppose my log is self.log('loss',0.2) It will get logged for each of the dataloader along with its index in the log name and its corresponding value: loss/dataloader_0 = 0.2 , loss/dataloader_1=0.4 and so on for every dataloader. Since my metric to monitor is ‘loss’, PTL also expects exact string ‘loss’ value to be logged, otherwise it throws below error

  
if not trainer.fit_loop.epoch_loop.val_loop._has_run:
              warning_cache.warn(m)
          else:
             raise MisconfigurationException(m)
E               pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='loss') not found in the   returned metrics: ['train_loss', 'train_loss_step', 'loss/dataloader_idx_0', 'loss/dataloader_idx_1', 'validation_f1',  'validation_precision', 'validation_recall', 'validation_accuracy']. HINT: Did you call self.log('loss', value) in the LightningModule?

But according to Neptune, ‘loss’ is now invalid once you have already logged ‘loss/dataloader_1’ (I guess) ? If so, you are both contradicting.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:15 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
stonelazycommented, Oct 15, 2021

Appreciate your reply.

I will pass this info to the product team, for the time being, I recommend to adjust loss names a bit

Sure, Thanks.

0reactions
kamil-kaczmarekcommented, Oct 14, 2021

Hey @stonelazy,

I checked the colab that you initially paste as a reproduction info: https://colab.research.google.com/drive/13rRlztjGRQrv6Y3W-d21Dotoj8L2UtoZ?usp=sharing

Here is a run that I made: https://app.neptune.ai/o/common/org/pytorch-lightning-integration/e/PTL-29/all

Here is what I did

  1. I changed “loss” to “val_loss” in your code (line 49)
  2. I did the same to the EarlyStopping callback argument monitor="val_loss" (line 105)

The error was fixed my making sure that you log val_loss to the separate namespace.

Yes, PTL creates 'loss/dataloader_idx_N paths when working with multiple dataloaders. In neptune you can create hierarchical structure of the run, but cannot log values to the loss and loss/dataloader_idx_N at the same time.

I will pass this info to the product team, for the time being, I recommend to adjust loss names a bit.

Pls, let me know what you think?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Neptune Loader Get-Status errorLogs examples
This error occurs when there was a temporary interruption in the data load process that was typically not caused by your request or...
Read more >
Running Neptune.ai in a loop - python - Stack Overflow
so i created a for loop so I can run various batch sizes, where each loop will open and close a neptune run....
Read more >
Deep Dive Into Error Analysis and Model Debugging in ...
Performing error analysis on three levels – Prediction, Data, and Features. How to look for bugs and fix them in your model training...
Read more >
1.3.3 PDF - PyTorch Lightning Documentation
Lightning just needs a DataLoader for the train/val/test splits. ... The Neptune logger can be used in the online mode or offline (silent) ......
Read more >
Data Loader Error Failed To Send Request | Palmiye
ZIP archives to add for class files. We list for failed to error logs to attribute data requests from two parameters to maximize...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found