Global steps smaller than total training set size after finished training
See original GitHub issueHey,
first of all, thanks for your great library, it’s been a huge help!
I used your architecture to finetune a pretrained BERT model from the hugging face library on a smaller dataset for Binary Text Classification.
From the training_progress_scores.csv
I noticed, that no matter, what epoch number I choose, the global steps within the epochs do not even get close to the size of my train dataset. Does that mean, the model doesn’t even look at all training examples? I wonder if I am confusing something here.
Here is an an example: I am training on ~4790 samples/training examples with the following hyperparameters and settings:
{"adam_epsilon": 1e-08, "do_lower_case": true, "use_early_stopping": false, "early_stopping_delta": 0.01, "early_stopping_metric": "acc", "early_stopping_metric_minimize": false, "early_stopping_patience": 5, "encoding": "utf-8", "eval_batch_size": 8, "evaluate_during_training": true, "evaluate_during_training_steps": 500, "evaluate_during_training_verbose": true, "fp16": false, "gradient_accumulation_steps": 1, "learning_rate": 2e-5, "logging_steps": 500, "manual_seed": 17, "max_grad_norm": 1.0, "max_seq_length": 128, "num_train_epochs": 3, "n_gpu": 1, "overwrite_output_dir": true, "reprocess_input_data": false, "save_eval_checkpoints": false, "save_model_every_epoch": false, "save_steps": 2000, "train_batch_size": 8, "use_cached_eval_features": false, "use_multiprocessing": true, "warmup_ratio": 0.10, "weight_decay": 0}
Nevertheless, the last documented global step in my progress file is 1521, did I make a mistake or misunderstand something? Happy for any feedback, thanks!
My training_progress_score.csv
global_step,tp,tn,fp,fn,mcc,f1,precision,recall,f1_weighted,precision_weighted,recall_weighted,train_loss,eval_loss,acc 500,249,130,52,19,0.670966975526973,0.875219683655536,0.8272425249169435,0.9291044776119403,0.838932445100472,0.8455398733032571,0.8422222222222222,0.11351273953914642,0.3626793656955686,0.8422222222222222 507,249,129,53,19,0.666374588262315,0.8736842105263157,0.8245033112582781,0.9291044776119403,0.8365295055821371,0.8435600501163415,0.84,0.9388631582260132,0.3636407738453464,0.84 1000,215,161,21,53,0.6750015267273123,0.8531746031746031,0.9110169491525424,0.8022388059701493,0.836979316979317,0.846839502261647,0.8355555555555556,0.03929607570171356,0.4004484978422784,0.8355555555555556 1014,243,140,42,25,0.6884167901651631,0.8788426763110306,0.8526315789473684,0.9067164179104478,0.849752504170481,0.8509544568491937,0.8511111111111112,0.0117262601852417,0.40816347301006317,0.8511111111111112 1500,241,145,37,27,0.7029086608187551,0.8827838827838829,0.8669064748201439,0.8992537313432836,0.8570713906307128,0.8572470395776403,0.8577777777777778,0.003922566771507263,0.5339946605657276,0.8577777777777778 1521,241,147,35,27,0.7124595156999709,0.8860294117647057,0.8731884057971014,0.8992537313432836,0.8616872291987956,0.861718029873952,0.8622222222222222,0.010883927345275879,0.5320389102164068,0.8622222222222222
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
I think what you are looking for is evaluation during training. There, you periodically evaluate the model on eval/validation data while training on the train data. Your evaluation loss will generally decrease with training until the model starts overfitting, at which point the eval loss will start increasing. It’s generally a good idea to stop training when the evaluation loss stops improving.
Look for the
evaluate_during_training_*
configuration options in the docs. You might also want to look intoearly_stopping
to automatically end training when evaluation loss stops improving.Honestly, this is more a question for stackoverflow or other helpful sites in the web. More data is always better to generalize more. Normally, you iterate over your whole dataset multiple times (epochs) to let your model converge. What you describe sounds way more like a good way to step into a local minimum (optimum) than really solving a problem.