question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue when using PyTorch Lightning integration while logging epochs

See original GitHub issue

🐛 Bug

aim fails to track the epoch metric produced by PyTorch Lightning:

  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
    self.fit_loop.run()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 242, in advance
    self.trainer.logger_connector.update_train_epoch_metrics()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 232, in update_train_epoch_metrics
    self.log_metrics(self.metrics["log"])
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 121, in log_metrics
    self.trainer.logger.save()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 427, in save
    logger.save()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 317, in save
    self._finalize_agg_metrics()
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 152, in _finalize_agg_metrics
    self.log_metrics(metrics=metrics_to_log, step=agg_step)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py", line 50, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/dbpprt/dev/lama/src/utils/aim_logger.py", line 76, in log_metrics
    self.experiment.track(v, name=name, context=context)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 412, in track
    self._track_impl(value, track_time, name, step, epoch, context=context)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 443, in _track_impl
    self._update_sequence_info(seq_info, ctx, val, name, step)
  File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 686, in _update_sequence_info
    raise ValueError(f'Cannot log value \'{val}\' on sequence \'{name}\'. Incompatible data types.')
ValueError: Cannot log value '4' on sequence 'epoch'. Incompatible data types.

PL Trainer Config is set to:

check_val_every_n_epoch: 5

Expected behavior

No crash.

Environment

  • Aim Version 3.5.3
  • Python version 3.9
  • OS: Ubuntu

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
dennisbappertcommented, Feb 22, 2022

I will open an issue in PL and reference this one.

Read more comments on GitHub >

github_iconTop Results From Across the Web

epoch changes types between int and float when passed to ...
Bug The reported epoch to downstream Loggers ... Issue when using PyTorch Lightning integration while logging epochs aimhubio/aim#1359.
Read more >
Logging only on epochs not working as intended?
Hi, I'm trying to log metrics only on epochs, but it doesn't seem to work as intended. Here is my code: class StackedLSTM(pl....
Read more >
How to Keep Track of PyTorch Lightning Experiments With ...
Just go to your LightningModule and call methods of the Neptune experiment available as self.logger.experiment . For example, we can log histograms of...
Read more >
Checkpointing (intermediate) - PyTorch Lightning
When using iterative training which doesn't have an epoch, you can checkpoint at every N training steps by specifying every_n_train_steps=N .
Read more >
Using PyTorch Lightning with Tune — Ray 1.11.0
We are also able to specify the number of epochs to train each model, and the number of GPUs we want to use...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found