question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Potential Leakage of Information Across Folds in Kfold.py

See original GitHub issue

🐛 Bug

I believe that there can potentially be some leakage of information across folds when changing some parameters in the Kfold.py script. Say that the user chooses to save every single checkpoint. After the first fold training finishes, the second fold uses the same checkpoint directory as the first fold. So if the second fold finishes training and the user decides to load the best checkpoint, the second fold may potentially load a checkpoint from the training of the first fold.

To Reproduce

Run the script Kfold.py

Expected behavior

We expect that the training process of multiple folds is independent of one another.

Environment

  • PyTorch Lightning Version: 1.6.0dev
  • PyTorch Version: 1.10.0+cu102
  • Python version: 3.7.11
  • OS: Linux
  • CUDA/cuDNN version: Using CPU
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

I think the solution may be to clear out previous checkpoints when starting out the new fold. We would also need to reset the checkpoint states (like reset minimum validation loss when advancing to the next fold).

cc @awaelchli @ananthsub @ninginthecloud @rohitgr7 @otaj @carmocca @justusschock

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
AlexTocommented, Apr 4, 2022

To use 1 model checkpoint per fold, here is how I did it:

  • In the model, log the metrics with different names for each fold, for e.g. val_loss should be f"fold_{fold}-val_loss"
def validation_step(self, batch, batch_idx):
    ...
    fold = self.trainer.fit_loop.current_fold 
    self.log(f"fold_{fold}-val_loss", loss.item(), on_step=False, on_epoch=True)
    ....
  • create multiple model checkpoint instances that monitor different fold val losses
model_checkpoints = [KFoldModelCheckpoint(
    filename="{" + f"fold_{f}-val_loss" + "}_{epoch}.pt",
    monitor=f"fold_{f}-val_loss",
    mode="min",
    every_n_epochs=1,
    save_top_k=3
) for f in range(num_folds)]
  • But note that the original ModelCheckpoint will throw an error because the model checkpoint for fold 0 can only monitor fold_0-val_loss so, during other folds, the metric fold_0-val_loss is not found. We can simply extend ModelCheckpoint to ignore folds that are not relevant
class KFoldModelCheckpoint(ModelCheckpoint):
    def _save_topk_checkpoint(self, trainer: "pl.Trainer", monitor_candidates: Dict[str, _METRIC]) -> None:
        if self.save_top_k == 0:
            return
        # validate metric
        if self.monitor is not None:
            if self.monitor not in monitor_candidates:
                if "fold" in self.monitor: # if fold specific metrics are not found in monitor_candidates, just don't do anything
                    return
                else:
                    m = (
                        f"`ModelCheckpoint(monitor={self.monitor!r})` could not find the monitored key in the returned"
                        f" metrics: {list(monitor_candidates)}."
                        f" HINT: Did you call `log({self.monitor!r}, value)` in the `LightningModule`?"
                    )
                    if trainer.fit_loop.epoch_loop.val_loop._has_run:
                        raise MisconfigurationException(m)
                    warning_cache.warn(m)
            self._save_monitor_checkpoint(trainer, monitor_candidates)
        else:
            self._save_none_monitor_checkpoint(trainer, monitor_candidates)

Now, model checkpoint for k-fold will work properly 😉

0reactions
stale[bot]commented, Jun 6, 2022

This issue has been automatically marked as stale because it hasn’t had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Potential Leakage of Information Across Folds in Kfold.py
Bug I believe that there can potentially be some leakage of information across folds when changing some parameters in the Kfold.py script.
Read more >
How to Avoid Data Leakage When Performing Data Preparation
The k-fold cross-validation procedure generally gives a more reliable estimate of model performance than a train-test split, although it is more ...
Read more >
feature scaling - K-Fold cross validation and data leakage
A reproducible example with no data leakage: In there I'm scaling the data only with the train data on the k-fold stage
Read more >
Stratified K-Fold Cross-Validation on Grouped Datasets
To use K-Fold cross-validation, we split the source dataset into K partitions. ... This approach has the advantage of avoiding potential leakage issues...
Read more >
K-Fold CV on Imbalance Classification Data | Analytics Vidhya
The stratified k-fold cross validation ensures each fold's sample is randomly selected without replacement, to reflect the 1:9 ratio imbalance ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found