question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to save checkpoints within lightning_logs?

See original GitHub issue

I’m currently doing checkpointing as follows:

checkpoint_callback = pl.callbacks.ModelCheckpoint(
          filepath=os.path.join(os.getcwd(), 'checkpoints/{epoch}-{val_loss:.2f}'),
          verbose=True,
          monitor='val_loss', 
          mode='min', 
          save_top_k=-1,
          period=1
      )

 
  trainer = pl.Trainer(
      default_save_path=os.path.join(os.getcwd(), 'log_files_are_stored_here'),
      gpus=1,
      max_epochs=2
      checkpoint_callback=checkpoint_callback
  )  

This creates the following folder structure:

β”œβ”€β”€ checkpoints # all the .pth files are saved here
└── log_files_are_stored_here
    └── lightning_logs 
       β”œβ”€β”€ version_0
       β”œβ”€β”€ version_1
       β”œβ”€β”€ version_2

How can I get the .pth files for each version to be saved in the respective version folders like so?:

└── log_files_are_stored_here
    └── lightning_logs 
       β”œβ”€β”€ version_0
            └── checkpoints #  save the .pth files here
       β”œβ”€β”€ version_1
            └── checkpoints #  save the .pth files here
       β”œβ”€β”€ version_2
            └── checkpoints #  save the .pth files here

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

7reactions
chris-clemcommented, Mar 26, 2020

Hi,

this is how I do it:

tt_logger = TestTubeLogger(save_dir=str(log_dir / "tt_logs"), name=run_name)

checkpoint_dir = (
    Path(tt_logger.save_dir)
    / tt_logger.experiment.name
    / f"version_{tt_logger.experiment.version}"
    / "checkpoints"
)
filepath = checkpoint_dir / "{epoch}-{val_loss:.4f}"
checkpoint_cb = ModelCheckpoint(filepath=str(filepath))

trainer = pl.Trainer(
        logger=tt_logger,
        checkpoint_callback=checkpoint_cb,
        ...
    )
3reactions
oplatekcommented, Mar 27, 2020

Hi, the TensorBoard version inspired by @chris-clem snippet.

Any idea how to get rid of the β€œHACK”?

tb_logger = TensorBoardLogger(save_dir='logs/tb/')

# HACK: to avoid tb_logger crashing in self._get_next_version() if I access tb_logger.log_dir
os.makedirs(f'logs/tb/default', exist_ok=True) 

mcp =  ModelCheckpoint(filepath=f'{tb_logger.log_dir}/' + '{epoch}_vl_{val_loss:.2f}')
trainer = Trainer(logger=tb_logger, checkpoint_callback=mcp) 
...
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to save checkpoints within lightning_logs? #1207 - GitHub
I'm currently doing checkpointing as follows: checkpoint_callback = pl.callbacks.ModelCheckpoint( filepath=os.path.join(os.getcwd(),Β ...
Read more >
Checkpointing β€” PyTorch Lightning 1.8.5.post0 documentation
Learn to save and load checkpoints ... Advanced. Enable cloud-based checkpointing and composable checkpoints. advanced ... Dig into the ModelCheckpoint API.
Read more >
Don't save lightning logs in Pytorch Lightning - Stack Overflow
You can disable checkpoint using the Trainer option enable_checkpointing : trainer = Trainer(enable_checkpointing=False).
Read more >
Using PyTorch Lightning with Tune β€” Ray 1.11.0
Adding checkpoints to the PyTorch Lightning moduleΒΆ. First, we need to introduce another callback to save model checkpoints. Since Tune requires a call...
Read more >
TiDB Lightning Glossary - PingCAP Docs
This page explains the special terms used in TiDB Lightning's logs, monitoring, configurations, and documentation.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found