question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Trainer] add --max_train_samples --max_val_samples --max_test_samples

See original GitHub issue

As we were planning to add --max_train_samples --max_val_samples --max_test_samples to all examples https://github.com/huggingface/transformers/issues/10423, I thought is there any reason why we don’t expand the Trainer to handle that?

It surely would be useful to be able to truncate the dataset at the point of Trainer to enable quick testing.

Another plus is that the metrics can then automatically include the actual number of samples run, rather than how it is done at the moment in examples.

That way this functionality would be built-in and examples will get it for free.

TODO:

  1. port --max_train_samples --max_val_samples --max_test_samples to Trainer and remove the then unneeded code in run_seq2seq.py
  2. extend metrics to report the number of samples as it’s done now in:

https://github.com/huggingface/transformers/blob/aca6288ff42cebded5421020f0ff088adeb446dd/examples/seq2seq/run_seq2seq.py#L590

so that all scripts automatically get this metric reported. Most likely it should be done here:

https://github.com/huggingface/transformers/blob/aca6288ff42cebded5421020f0ff088adeb446dd/src/transformers/trainer_utils.py#L224

@sgugger

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:23 (23 by maintainers)

github_iconTop GitHub Comments

2reactions
sguggercommented, Mar 1, 2021

Yes, I think it’s the best solution.

1reaction
sguggercommented, Mar 5, 2021

Please don’t touch the TF examples as they have not been cleaned up and will change in the near future. And yes, none of the TF examples are currently tested.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found