Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[trainer] port metrics logging and saving methods to all example scripts

See original GitHub issue

In an effort to make the examples easier to read, in https://github.com/huggingface/transformers/pull/10266 we added new trainer methods:

trainer.log_metrics - to perform consistent formatting for logged metrics
trainer.save_metrics - to save the metrics into a corresponding json file.

and deployed them in run_seq2seq.py.

The next task is do the same for all the other examples/*/run_*.py scripts.

Steps:

Study the diff for run_seq2seq.py. https://github.com/huggingface/transformers/pull/10266/files#diff-82bfb61a8b91894c2c2101734a6ab7b415be4ace5cd1e01b4c37663020d924ae
pick a script, e.g. examples/multiple-choice/run_swag.py
apply the same changes as in step 1 removing the explicit metrics printing lines and replacing them with the 2 new methods
test the modified script (usually README.md for that folder should have the instructions to do so) and see that your change works - train/eval/test metrics are printed using the new way and that (train|eval|test|all)_results.json are generated. You can use a very short datasample 5 records is enough, by just adding: --max_train_samples 5 --max_val_samples 5 --max_test_samples 5

repeat for other scripts.

Thank you very much!

The metrics log should be similar to this, with the exception of using different scoring metrics:



02/16/2021 17:06:39 - INFO - __main__ -   ***** train metrics *****
02/16/2021 17:06:39 - INFO - __main__ -     epoch                      =    1.0
02/16/2021 17:06:39 - INFO - __main__ -     init_mem_cpu_alloc_delta   =    2MB
02/16/2021 17:06:39 - INFO - __main__ -     init_mem_cpu_peaked_delta  =    0MB
02/16/2021 17:06:39 - INFO - __main__ -     init_mem_gpu_alloc_delta   =  230MB
02/16/2021 17:06:39 - INFO - __main__ -     init_mem_gpu_peaked_delta  =    0MB
02/16/2021 17:06:39 - INFO - __main__ -     total_flos                 = 2128GF
02/16/2021 17:06:39 - INFO - __main__ -     train_mem_cpu_alloc_delta  =   55MB
02/16/2021 17:06:39 - INFO - __main__ -     train_mem_cpu_peaked_delta =    0MB
02/16/2021 17:06:39 - INFO - __main__ -     train_mem_gpu_alloc_delta  =  692MB
02/16/2021 17:06:39 - INFO - __main__ -     train_mem_gpu_peaked_delta =  661MB
02/16/2021 17:06:39 - INFO - __main__ -     train_runtime              = 2.3114
02/16/2021 17:06:39 - INFO - __main__ -     train_samples              =    100
02/16/2021 17:06:39 - INFO - __main__ -     train_samples_per_second   =  3.028

02/16/2021 17:06:43 - INFO - __main__ -   ***** val metrics *****
02/16/2021 17:13:05 - INFO - __main__ -     epoch                     =     1.0
02/16/2021 17:13:05 - INFO - __main__ -     eval_bleu                 = 24.6502
02/16/2021 17:13:05 - INFO - __main__ -     eval_gen_len              =    32.9
02/16/2021 17:13:05 - INFO - __main__ -     eval_loss                 =  3.7533
02/16/2021 17:13:05 - INFO - __main__ -     eval_mem_cpu_alloc_delta  =     0MB
02/16/2021 17:13:05 - INFO - __main__ -     eval_mem_cpu_peaked_delta =     0MB
02/16/2021 17:13:05 - INFO - __main__ -     eval_mem_gpu_alloc_delta  =     0MB
02/16/2021 17:13:05 - INFO - __main__ -     eval_mem_gpu_peaked_delta =   510MB
02/16/2021 17:13:05 - INFO - __main__ -     eval_runtime              =  3.9266
02/16/2021 17:13:05 - INFO - __main__ -     eval_samples              =     100
02/16/2021 17:13:05 - INFO - __main__ -     eval_samples_per_second   =  25.467

02/16/2021 17:06:48 - INFO - __main__ -     ***** test metrics *****
02/16/2021 17:06:48 - INFO - __main__ -     test_bleu                 = 27.146
02/16/2021 17:06:48 - INFO - __main__ -     test_gen_len              =  41.37
02/16/2021 17:06:48 - INFO - __main__ -     test_loss                 = 3.6682
02/16/2021 17:06:48 - INFO - __main__ -     test_mem_cpu_alloc_delta  =    0MB
02/16/2021 17:06:48 - INFO - __main__ -     test_mem_cpu_peaked_delta =    0MB
02/16/2021 17:06:48 - INFO - __main__ -     test_mem_gpu_alloc_delta  =    0MB
02/16/2021 17:06:48 - INFO - __main__ -     test_mem_gpu_peaked_delta =  645MB
02/16/2021 17:06:48 - INFO - __main__ -     test_runtime              = 5.1136
02/16/2021 17:06:48 - INFO - __main__ -     test_samples              =    100
02/16/2021 17:06:48 - INFO - __main__ -     test_samples_per_second   = 19.556

Issue Analytics

State:
Created 3 years ago
Comments:13 (13 by maintainers)

Top GitHub Comments

1reaction

bhadreshpsavanicommented, Feb 26, 2021

Sure @stas00, I will be happy to work on it!

1reaction

stas00commented, Feb 26, 2021

Oh and as you are doing an amazingly useful work syncing all examples to look and feel similar, there is one very crucial thing to sync and it’s templates/adding_a_new_example_script/ on which all new examples will be based, so we better have a good template to start with. I forgot to mention that earlier. Thank you!

Top Results From Across the Web

Log metrics in the designer - Azure Machine Learning

Monitor your Azure ML designer experiments. Enable logging using the Execute Python Script component and view the logged results in the ...

Ingest logs and metrics with Elastic Agent

This guide describes how to: Monitor logs and infrastructure metrics from systems and services across your organization; Monitor Nginx logs and metrics ......

Getting started - Prometheus.io

This guide is a "Hello World"-style tutorial which shows how to install, configure, and use a simple Prometheus instance. You will download and...

Monitoring Ray Serve — Ray 3.0.0.dev0

This section helps you debug and monitor your Serve applications by: viewing the Ray dashboard. using Ray logging and Loki. inspecting built-in Ray...

Log4j 2 Tutorial: Configuration Example for Logging in Java

Learn how the Java Log4j 2 library works. Discover how to configure and use appenders, filters, layouts, and levels for logging your Java ......