Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[examples] add --max_train_samples --max_val_samples --max_test_samples cl args to all scripts

See original GitHub issue

As a part of an effort to make all examples have the same look and feel this issue requests to sync the support for these 3 cl args in run_seq2seq.py:

--max_train_samples 5 --max_val_samples 5 --max_test_samples 5

into:

all other examples/*/run_*.py
templates/adding_a_new_example_script

Part B. the metrics should be now updated to include the actual number of samples that were run. here is an example for train: https://github.com/huggingface/transformers/blob/f52a15897b46ffa40af5c96d3726f0e18e91879b/examples/seq2seq/run_seq2seq.py#L586-L590 and the same for eval/test.

I’d say this can probable refactored too. Let me check with Sylvain.

The way it’s currently used is to limit the number of dataset entries w/o needing to change the dataset, example:

run_seq2seq.py --model_name_or_path t5-small --output_dir output_dir  --do_eval --do_predict --do_train \
--evaluation_strategy=steps  --predict_with_generate  --task summarization     --dataset_name xsum \
--max_train_samples 60 --max_val_samples 10 --n_test 10

All the code that currently takes care of it can be found inside https://github.com/huggingface/transformers/blob/master/examples/seq2seq/run_seq2seq.py

This issue is open to anybody in the community who would like to tackle it.

Thank you!