Support early_stopping with custom validation_set
See original GitHub issueDescribe the workflow you want to enable
Today in SGDClassifier, the parameter early_stopping uses a fraction of the data randomly, it would be useful to support a custom validation set chosen by the user.
Describe your proposed solution
for example:
clf = SGDClassifier(early_stopping=True)
clf.fit(X_train, y_train, eval_set=(X_val, y_val))
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:14 (11 by maintainers)
Top Results From Across the Web
Validation set for early stopping - Cross Validated
The idea of early stopping with a validation set is 'to get a better estimate Eval(g−) of a worse quantity Eout(g−)'.
Read more >Migrate early stopping | TensorFlow Core
This notebook demonstrates how you can set up model training with early stopping, ... Write a custom early stopping rule in a custom...
Read more >Use Early Stopping to Halt the Training of Neural Networks At ...
Early stopping requires that a validation dataset is evaluated during training. This can be achieved by specifying the validation dataset to ...
Read more >Early Stopping in Practice: an example with Keras and ...
Early Stopping monitors the performance of the model for every epoch on a held-out validation set during the training, and terminate the ...
Read more >Can You Define a Custom Validation Set With Scikit-Learn's ...
The main purpose of being able to supply the custom validation set is to use it in conjunction with the early stopping feature, ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Note that auto-splitting from the training set inside the final classifier/regressor is problematic when this estimator is wrapped in a rebalancing meta-estimator to tackle target imbalance problems: rebalancing should happen only on training data while early stopping, model selection and evaluation should only use metrics computed using originally balanced data.
I am not sure an auto-magical API would work for this. Making it possible to pass a manually prepared validation might be the sanest way to deal with this situation.
I guess we could think of an API where pipeline’s
fit
accepts avalidation_set
, as well as many other estimators and all other meta-estimators, and handle that properly. But that’s a quite a large project to pull off.