[trainer] port metrics logging and saving methods to all example scripts
See original GitHub issueIn an effort to make the examples easier to read, in https://github.com/huggingface/transformers/pull/10266 we added new trainer methods:
trainer.log_metrics
- to perform consistent formatting for logged metricstrainer.save_metrics
- to save the metrics into a corresponding json file.
and deployed them in run_seq2seq.py
.
The next task is do the same for all the other examples/*/run_*.py
scripts.
Steps:
- Study the diff for
run_seq2seq.py
. https://github.com/huggingface/transformers/pull/10266/files#diff-82bfb61a8b91894c2c2101734a6ab7b415be4ace5cd1e01b4c37663020d924ae - pick a script, e.g.
examples/multiple-choice/run_swag.py
- apply the same changes as in step 1 removing the explicit metrics printing lines and replacing them with the 2 new methods
- test the modified script (usually
README.md
for that folder should have the instructions to do so) and see that your change works - train/eval/test metrics are printed using the new way and that(train|eval|test|all)_results.json
are generated. You can use a very short datasample 5 records is enough, by just adding:--max_train_samples 5 --max_val_samples 5 --max_test_samples 5
repeat for other scripts.
Thank you very much!
The metrics log should be similar to this, with the exception of using different scoring metrics:
02/16/2021 17:06:39 - INFO - __main__ - ***** train metrics *****
02/16/2021 17:06:39 - INFO - __main__ - epoch = 1.0
02/16/2021 17:06:39 - INFO - __main__ - init_mem_cpu_alloc_delta = 2MB
02/16/2021 17:06:39 - INFO - __main__ - init_mem_cpu_peaked_delta = 0MB
02/16/2021 17:06:39 - INFO - __main__ - init_mem_gpu_alloc_delta = 230MB
02/16/2021 17:06:39 - INFO - __main__ - init_mem_gpu_peaked_delta = 0MB
02/16/2021 17:06:39 - INFO - __main__ - total_flos = 2128GF
02/16/2021 17:06:39 - INFO - __main__ - train_mem_cpu_alloc_delta = 55MB
02/16/2021 17:06:39 - INFO - __main__ - train_mem_cpu_peaked_delta = 0MB
02/16/2021 17:06:39 - INFO - __main__ - train_mem_gpu_alloc_delta = 692MB
02/16/2021 17:06:39 - INFO - __main__ - train_mem_gpu_peaked_delta = 661MB
02/16/2021 17:06:39 - INFO - __main__ - train_runtime = 2.3114
02/16/2021 17:06:39 - INFO - __main__ - train_samples = 100
02/16/2021 17:06:39 - INFO - __main__ - train_samples_per_second = 3.028
02/16/2021 17:06:43 - INFO - __main__ - ***** val metrics *****
02/16/2021 17:13:05 - INFO - __main__ - epoch = 1.0
02/16/2021 17:13:05 - INFO - __main__ - eval_bleu = 24.6502
02/16/2021 17:13:05 - INFO - __main__ - eval_gen_len = 32.9
02/16/2021 17:13:05 - INFO - __main__ - eval_loss = 3.7533
02/16/2021 17:13:05 - INFO - __main__ - eval_mem_cpu_alloc_delta = 0MB
02/16/2021 17:13:05 - INFO - __main__ - eval_mem_cpu_peaked_delta = 0MB
02/16/2021 17:13:05 - INFO - __main__ - eval_mem_gpu_alloc_delta = 0MB
02/16/2021 17:13:05 - INFO - __main__ - eval_mem_gpu_peaked_delta = 510MB
02/16/2021 17:13:05 - INFO - __main__ - eval_runtime = 3.9266
02/16/2021 17:13:05 - INFO - __main__ - eval_samples = 100
02/16/2021 17:13:05 - INFO - __main__ - eval_samples_per_second = 25.467
02/16/2021 17:06:48 - INFO - __main__ - ***** test metrics *****
02/16/2021 17:06:48 - INFO - __main__ - test_bleu = 27.146
02/16/2021 17:06:48 - INFO - __main__ - test_gen_len = 41.37
02/16/2021 17:06:48 - INFO - __main__ - test_loss = 3.6682
02/16/2021 17:06:48 - INFO - __main__ - test_mem_cpu_alloc_delta = 0MB
02/16/2021 17:06:48 - INFO - __main__ - test_mem_cpu_peaked_delta = 0MB
02/16/2021 17:06:48 - INFO - __main__ - test_mem_gpu_alloc_delta = 0MB
02/16/2021 17:06:48 - INFO - __main__ - test_mem_gpu_peaked_delta = 645MB
02/16/2021 17:06:48 - INFO - __main__ - test_runtime = 5.1136
02/16/2021 17:06:48 - INFO - __main__ - test_samples = 100
02/16/2021 17:06:48 - INFO - __main__ - test_samples_per_second = 19.556
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (13 by maintainers)
Top Results From Across the Web
Log metrics in the designer - Azure Machine Learning
Monitor your Azure ML designer experiments. Enable logging using the Execute Python Script component and view the logged results in the ...
Read more >Ingest logs and metrics with Elastic Agent
This guide describes how to: Monitor logs and infrastructure metrics from systems and services across your organization; Monitor Nginx logs and metrics ......
Read more >Getting started - Prometheus.io
This guide is a "Hello World"-style tutorial which shows how to install, configure, and use a simple Prometheus instance. You will download and...
Read more >Monitoring Ray Serve — Ray 3.0.0.dev0
This section helps you debug and monitor your Serve applications by: viewing the Ray dashboard. using Ray logging and Loki. inspecting built-in Ray...
Read more >Log4j 2 Tutorial: Configuration Example for Logging in Java
Learn how the Java Log4j 2 library works. Discover how to configure and use appenders, filters, layouts, and levels for logging your Java ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sure @stas00, I will be happy to work on it!
Oh and as you are doing an amazingly useful work syncing all examples to look and feel similar, there is one very crucial thing to sync and it’s
templates/adding_a_new_example_script/
on which all new examples will be based, so we better have a good template to start with. I forgot to mention that earlier. Thank you!