train/eval step results log not shown in terminal for tf_trainer.py
See original GitHub issueEnvironment info
transformers
version: 3.1.0- Platform: Linux-5.4.0-42-generic-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.6.9
- PyTorch version (GPU?): 1.6.0 (False)
- Tensorflow version (GPU?): 2.2.0 (False)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Trainer: @sgugger tensorflow: @jplu @LysandreJik
Information
In the current code, which is without setting logger.setLevel(logging.INFO)
in trainer_tf.py
:
09/12/2020 03:42:41 - INFO - absl - Load dataset info from /home/imo/tensorflow_datasets/glue/sst2/1.0.0
09/12/2020 03:42:41 - INFO - absl - Reusing dataset glue (/home/imo/tensorflow_datasets/glue/sst2/1.0.0)
09/12/2020 03:42:41 - INFO - absl - Constructing tf.data.Dataset for split validation, from /home/imo/tensorflow_datasets/glue/sst2/1.0.0
2020-09-12 03:42:57.010229: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 41707 of 67349
2020-09-12 03:43:03.412045: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
2020-09-12 03:43:56.636791: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 36279 of 67349
2020-09-12 03:44:04.474751: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
09/12/2020 03:44:51 - INFO - __main__ - *** Evaluate ***
09/12/2020 03:45:02 - INFO - __main__ - ***** Eval results *****
09/12/2020 03:45:02 - INFO - __main__ - eval_loss = 0.712074209790711
09/12/2020 03:45:02 - INFO - __main__ - eval_acc = 0.48977272727272725
You can see that the train/eval step logs are not shown.
If I specify, manually, logger.setLevel(logging.INFO)
in trainer_tf.py
:
09/12/2020 06:04:39 - INFO - absl - Load dataset info from /home/imo/tensorflow_datasets/glue/sst2/1.0.0
09/12/2020 06:04:39 - INFO - absl - Reusing dataset glue (/home/imo/tensorflow_datasets/glue/sst2/1.0.0)
09/12/2020 06:04:39 - INFO - absl - Constructing tf.data.Dataset for split validation, from /home/imo/tensorflow_datasets/glue/sst2/1.0.0
You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
To use comet_ml logging, run `pip/conda install comet_ml` see https://www.comet.ml/docs/python-sdk/huggingface/
***** Running training *****
Num examples = 67349
Num Epochs = 1
Instantaneous batch size per device = 4
Total train batch size (w. parallel, distributed & accumulation) = 4
Gradient Accumulation steps = 1
Steps per epoch = 4
Total optimization steps = 4
2020-09-12 06:04:49.637373: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 39626 of 67349
2020-09-12 06:04:56.805687: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
{'loss': 0.6994307, 'learning_rate': 3.7499998e-05, 'epoch': 0.5, 'step': 1}
{'loss': 0.6897122, 'learning_rate': 2.5e-05, 'epoch': 0.75, 'step': 2}
Saving checkpoint for step 2 at ./sst-2/checkpoint/ckpt-1
{'loss': 0.683386, 'learning_rate': 1.25e-05, 'epoch': 1.0, 'step': 3}
{'loss': 0.68290234, 'learning_rate': 0.0, 'epoch': 1.25, 'step': 4}
Saving checkpoint for step 4 at ./sst-2/checkpoint/ckpt-2
Training took: 0:00:43.099437
Saving model in ./sst-2/
09/12/2020 06:05:26 - INFO - __main__ - *** Evaluate ***
***** Running Evaluation *****
Num examples = 872
Batch size = 8
{'eval_loss': 0.6990196158032899, 'eval_acc': 0.49204545454545456, 'epoch': 1.25, 'step': 4}
09/12/2020 06:05:35 - INFO - __main__ - ***** Eval results *****
09/12/2020 06:05:35 - INFO - __main__ - eval_loss = 0.6990196158032899
09/12/2020 06:05:35 - INFO - __main__ - eval_acc = 0.49204545454545456
We see more information like
{'loss': 0.6994307, 'learning_rate': 3.7499998e-05, 'epoch': 0.5, 'step': 1}
More importantly, we also see this message
You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
To use comet_ml logging, run `pip/conda install comet_ml` see https://www.comet.ml/docs/python-sdk/huggingface/
, which won’t be shown if logging level is not set to INFO.
Related
In the PR #6097, @LysandreJik changed logger.info(output)
to print(output)
in trainer.py
in order to show logs on the screen.
Maybe we should do the same thing for tf_trainer.py
. If not, could we set logging level to INFO in tf_trainer.py
- however this would become different from trainer.py
where the logging level is not set (at least, not in the trainer script).
To reproduce
python3 run_tf_glue.py \
--task_name sst-2 \
--model_name_or_path distilbert-base-uncased \
--output_dir ./sst-2/ \
--max_seq_length 16 \
--num_train_epochs 2 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 1 \
--max_steps 4 \
--logging_steps 1 \
--save_steps 2 \
--seed 1 \
--do_train \
--do_eval \
--do_predict \
--overwrite_output_dir
Expected behavior
I expect the train/eval step logs will be shown on the screen.
Remark
I can make a PR once a decision is made by the team.
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (5 by maintainers)
@jplu As you might know, I open this issue, but I don’t necessary have the whole context. So I leave you to decide the desired behavior for tf_trainer.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.