[examples] add --max_train_samples --max_val_samples --max_test_samples cl args to all scripts
See original GitHub issueAs a part of an effort to make all examples have the same look and feel this issue requests to sync the support for these 3 cl args in run_seq2seq.py
:
--max_train_samples 5 --max_val_samples 5 --max_test_samples 5
into:
- all other
examples/*/run_*.py
templates/adding_a_new_example_script
Part B. the metrics should be now updated to include the actual number of samples that were run. here is an example for train: https://github.com/huggingface/transformers/blob/f52a15897b46ffa40af5c96d3726f0e18e91879b/examples/seq2seq/run_seq2seq.py#L586-L590 and the same for eval/test.
I’d say this can probable refactored too. Let me check with Sylvain.
The way it’s currently used is to limit the number of dataset entries w/o needing to change the dataset, example:
run_seq2seq.py --model_name_or_path t5-small --output_dir output_dir --do_eval --do_predict --do_train \
--evaluation_strategy=steps --predict_with_generate --task summarization --dataset_name xsum \
--max_train_samples 60 --max_val_samples 10 --n_test 10
All the code that currently takes care of it can be found inside https://github.com/huggingface/transformers/blob/master/examples/seq2seq/run_seq2seq.py
This issue is open to anybody in the community who would like to tackle it.
Thank you!
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Top GitHub Comments
Hi @stas00, Since its just a template there no way to test the changes right?
Cool!