[trainer] `--load_best_model_at_end` silently turns of `--save_steps` settings
See original GitHub issueSplitting off from https://github.com/huggingface/transformers/pull/12477#discussion_r668326212
Currently --load_best_model_at_end
silently turns off --save_steps
settings when --do_eval
is off (or --evaluation_strategy
is set to other than "no"
, which otherwise automatically turns on --do_eval
)
The proposal is to assert if:
--load_best_model_at_end
is set and --evaluation_strategy
is "no"
Reproducible test:
export BS=16; rm -r output_dir; PYTHONPATH=src USE_TF=0 CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --output_dir output_dir --adam_eps 1e-06 --do_train --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 500 --max_source_length 128 --max_target_length 128 --val_max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_train_batch_size $BS --predict_with_generate --sortish_sampler --source_lang en --target_lang ro --dataset_name wmt16 --dataset_config "ro-en" --source_prefix "translate English to Romanian: " --warmup_steps 50 --max_train_samples 50 --save_steps 1
which saves checkpoints.
then adding --load_best_model_at_end
stops saving those.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, as said in that comment, I think it’s reasonable if we raise an error if
--load_best_model_at_end
is set and--evaluation_strategy
is “no” since there is no “best model” to pick from in that case. I can do it later today if you want.Yes, this was fixed by #12786 in the end.