Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue when using PyTorch Lightning integration while logging epochs

See original GitHub issue

🐛 Bug

aim fails to track the epoch metric produced by PyTorch Lightning:

  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
    self.fit_loop.run()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 242, in advance
    self.trainer.logger_connector.update_train_epoch_metrics()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 232, in update_train_epoch_metrics
    self.log_metrics(self.metrics["log"])
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 121, in log_metrics
    self.trainer.logger.save()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 427, in save
    logger.save()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 317, in save
    self._finalize_agg_metrics()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 152, in _finalize_agg_metrics
    self.log_metrics(metrics=metrics_to_log, step=agg_step)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py", line 50, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/dbpprt/dev/lama/src/utils/aim_logger.py", line 76, in log_metrics
    self.experiment.track(v, name=name, context=context)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 412, in track
    self._track_impl(value, track_time, name, step, epoch, context=context)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 443, in _track_impl
    self._update_sequence_info(seq_info, ctx, val, name, step)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 686, in _update_sequence_info
    raise ValueError(f'Cannot log value \'{val}\' on sequence \'{name}\'. Incompatible data types.')
ValueError: Cannot log value '4' on sequence 'epoch'. Incompatible data types.

PL Trainer Config is set to:

check_val_every_n_epoch: 5

Expected behavior

No crash.

Environment

Aim Version 3.5.3
Python version 3.9
OS: Ubuntu

Issue Analytics

State:
Created 2 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

dennisbappertcommented, Feb 22, 2022

Tracked here

https://github.com/PyTorchLightning/pytorch-lightning/issues/12050

1reaction

dennisbappertcommented, Feb 22, 2022

I will open an issue in PL and reference this one.

Top Results From Across the Web

epoch changes types between int and float when passed to ...

Bug The reported epoch to downstream Loggers ... Issue when using PyTorch Lightning integration while logging epochs aimhubio/aim#1359.

Logging only on epochs not working as intended?

Hi, I'm trying to log metrics only on epochs, but it doesn't seem to work as intended. Here is my code: class StackedLSTM(pl....

How to Keep Track of PyTorch Lightning Experiments With ...

Just go to your LightningModule and call methods of the Neptune experiment available as self.logger.experiment . For example, we can log histograms of...

Checkpointing (intermediate) - PyTorch Lightning

When using iterative training which doesn't have an epoch, you can checkpoint at every N training steps by specifying every_n_train_steps=N .

Using PyTorch Lightning with Tune — Ray 1.11.0

We are also able to specify the number of epochs to train each model, and the number of GPUs we want to use...