Issue when using PyTorch Lightning integration while logging epochs
See original GitHub issue🐛 Bug
aim fails to track the epoch metric produced by PyTorch Lightning:
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
self.fit_loop.run()
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 242, in advance
self.trainer.logger_connector.update_train_epoch_metrics()
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 232, in update_train_epoch_metrics
self.log_metrics(self.metrics["log"])
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 121, in log_metrics
self.trainer.logger.save()
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 427, in save
logger.save()
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 317, in save
self._finalize_agg_metrics()
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/loggers/base.py", line 152, in _finalize_agg_metrics
self.log_metrics(metrics=metrics_to_log, step=agg_step)
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/pytorch_lightning/utilities/distributed.py", line 50, in wrapped_fn
return fn(*args, **kwargs)
File "/home/dbpprt/dev/lama/src/utils/aim_logger.py", line 76, in log_metrics
self.experiment.track(v, name=name, context=context)
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 412, in track
self._track_impl(value, track_time, name, step, epoch, context=context)
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 443, in _track_impl
self._update_sequence_info(seq_info, ctx, val, name, step)
File "/home/dbpprt/miniforge3/envs/lama/lib/python3.9/site-packages/aim/sdk/run.py", line 686, in _update_sequence_info
raise ValueError(f'Cannot log value \'{val}\' on sequence \'{name}\'. Incompatible data types.')
ValueError: Cannot log value '4' on sequence 'epoch'. Incompatible data types.
PL Trainer Config is set to:
check_val_every_n_epoch: 5
Expected behavior
No crash.
Environment
- Aim Version 3.5.3
- Python version 3.9
- OS: Ubuntu
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
epoch changes types between int and float when passed to ...
Bug The reported epoch to downstream Loggers ... Issue when using PyTorch Lightning integration while logging epochs aimhubio/aim#1359.
Read more >Logging only on epochs not working as intended?
Hi, I'm trying to log metrics only on epochs, but it doesn't seem to work as intended. Here is my code: class StackedLSTM(pl....
Read more >How to Keep Track of PyTorch Lightning Experiments With ...
Just go to your LightningModule and call methods of the Neptune experiment available as self.logger.experiment . For example, we can log histograms of...
Read more >Checkpointing (intermediate) - PyTorch Lightning
When using iterative training which doesn't have an epoch, you can checkpoint at every N training steps by specifying every_n_train_steps=N .
Read more >Using PyTorch Lightning with Tune — Ray 1.11.0
We are also able to specify the number of epochs to train each model, and the number of GPUs we want to use...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Tracked here
https://github.com/PyTorchLightning/pytorch-lightning/issues/12050
I will open an issue in PL and reference this one.