[Trainer] add --max_train_samples --max_val_samples --max_test_samples
See original GitHub issueAs we were planning to add --max_train_samples --max_val_samples --max_test_samples
to all examples https://github.com/huggingface/transformers/issues/10423, I thought is there any reason why we don’t expand the Trainer to handle that?
It surely would be useful to be able to truncate the dataset at the point of Trainer to enable quick testing.
Another plus is that the metrics can then automatically include the actual number of samples run, rather than how it is done at the moment in examples.
That way this functionality would be built-in and examples will get it for free.
TODO:
- port
--max_train_samples --max_val_samples --max_test_samples
to Trainer and remove the then unneeded code inrun_seq2seq.py
- extend metrics to report the number of samples as it’s done now in:
so that all scripts automatically get this metric reported. Most likely it should be done here:
Issue Analytics
- State:
- Created 3 years ago
- Comments:23 (23 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, I think it’s the best solution.
Please don’t touch the TF examples as they have not been cleaned up and will change in the near future. And yes, none of the TF examples are currently tested.