validation_data parameter to fit with preprocessing
See original GitHub issueIt’s pretty common in keras to pass validation_data to fit to monitor how the loss behaves out of sample for each epoch during training.
I noticed that scikeras offers two solutions:
- pass
validation_splitin the initialization - pass
fit__validation_datain the initialization
Since I need to preprocess X and y before fitting I cannot use validation_split, because this would cause data leakage so I should opt for solution 2 but this makes the model stateful since the fit__validation_data is attached to the model instance.
Do you have any suggestion for this problem?
Many thanks
Issue Analytics
- State:
- Created 3 years ago
- Comments:29 (14 by maintainers)
Top Results From Across the Web
How does the validation_split parameter of Keras' fit function ...
Does it means that validation data is always fixed and taken from bottom of main dataset? Is there any way it can be...
Read more >Training & evaluation with the built-in methods - Keras
Using a validation dataset. You can pass a Dataset instance as the validation_data argument in fit() : model ...
Read more >XGboost: cannot pass validation data for eval_set in pipeline
fit () the model with Xgboost parameters; dump the fitted pipeline. as follows: from sklearn.preprocessing import StandardScaler from sklearn.
Read more >Cross-Validation and Hyperparameter Tuning: How to ...
In the first two parts of this article I obtained and preprocessed Fitbit sleep data, split the data into training, validation and test...
Read more >Evaluate the Performance of Deep Learning Models in Keras
You can do this by setting the validation_split argument on the fit() function to a percentage of the size of your training dataset....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I am already using callbacks for other purposes and I am aware I could have used callbacks for this as well.
However it requires a lot more effort and the main drawback is that, for saving the output of the validation losses I would need to create a new instance attribute on
self.modelwhich is definitely not good practice.As I said as long as I can use
fit__validation_data, even if undocumented, I am ok and the I issue can be closed because I would subclassKerasRegressoras I want to modify the signature of the function to accept**fit_params.The reason why I opened the issue is because I think that
scikeraswould benefit from this change.Keeping track of the loss on the hold out set vs training set is one of the most important aspects of any serious ML workflow which involves Neural Networks, as well as data leakage which very often disregarded. For this reason passing
validation_datato fit should only be natural and very easy to do.Unfortunately I believe that both proposed solution don’t add any advantages to just changing the signature of
fitto accept**fit_params. I can only see disadvantages in using either callbacks or for loops + partial fit.That being said, whatever you decide, thank you for considering my thoughts on this. I believe this is an excellent package!
Many thanks Gio
Hi @gioxc88 , sorry to bother you again.
We are discussing implementing a feature that might help with your use case. This is not a replacement for
**kwargs, merely another option for when those are not possible (grid search, cross validation, etc.). Would you be able to take a look at this example implementation and/or theDatasetTransformersection in these docs? Thank you!