Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Request: Ignore Dataset transforms when iterating to the most recent checkpoint when resuming training

See original GitHub issue

🚀 Feature request

It’d be great if, when resuming training from a checkpoint and using a Dataset with a format/transform function applied, the dataset’s format/transform function could be ignored while iterating up to the last checkpoint step.

@lhoestq @sgugger

Motivation

I doubt it’s much of an issue most of the time, but I’ve started playing with dataset.set_transform() for doing some heavy preprocessing, and just iterating through samples to the current checkpoint step can take a ridiculously long time compared to a dataset without a transform applied. And I don’t think there’s any case where the transformed sample would be used, right?

See this conversation in the forum for more backstory and my rudimentary thoughts on how I’d accomplish it.

Your contribution

I’m hesitant to try updating any of the trainer code myself since it’s so complicated, and needs to cover so many edge cases I’m not familiar with.

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

sguggercommented, Mar 9, 2021

This is already there 😃 Just pass along --ignore_data_skip in your script or ignore_data_skip=True in your TrainingArguments.

0reactions

github-actions[bot]commented, Apr 14, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Top Results From Across the Web

How to avoid iterating over Dataloader while resuming ...

The issue I'm facing is that each time I resume training from a checkpoint as per their Trainer class via the model_path in...

Trainer - Hugging Face

Trainer is a simple but feature-complete training and eval loop for PyTorch, optimized for Transformers. Important attributes ...

Loading from checkpoint seems to hang indefinitely for Roberta

I'm trying to resume training my Roberta model from a checkpoint. When the training initialises it seems to pick up the last checkpoint:....

A Guide To Using Checkpoints — Ray 2.2.0

This topic is relevant to trial checkpoints. Tune stores checkpoints on the node where the trials are executed. If you are training on...

Trainer — PyTorch Lightning 1.8.5.post0 documentation

By default Lightning saves a checkpoint for you in your current working directory, with the state of your last training epoch, Checkpoints capture...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Request: Ignore Dataset transforms when iterating to the most recent checkpoint when resuming training

🚀 Feature request

Motivation

Your contribution

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Run_qa crashes because of parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))

SortedDL for contiguous LM