longevity of models
See original GitHub issueTraining models might take considerable amounts of time and energy, but with every upgrade of sklearn one currently might lose the ability to use a model trained with an older release version.
While this is mentioned in the docs, i still find it upsetting, as i can either pin sklearn to some old version or re-train all models and use an up-to-date sklearn. Re-training also might make it necessary to keep large amounts of training data around and ship them to all machines, potentially with parameters found by grid-search on previous versions to at least save grid-search times.
For example, i have a GradientBoostingClassifier model from 0.18.1 that now causes this userwarning:
Trying to unpickle estimator GradientBoostingClassifier from version 0.18.1 when using version 0.19.1. This might lead to breaking code or invalid results. Use at your own risk.
When trying to actually use the model the result is:
AttributeError: 'GradientBoostingClassifier' object has no attribute 'n_features_'
From my guess, this is mainly an issue of using pickle on the model object (which is currently the recommended way to serialize models): every rename of model properties in the source will cause a whole lot of trouble when loading a model object pre-dating that change, as the methods won’t find the properties anymore.
At the same time i think that the underlying idea of an estimator changes rather rarely.
Would there be any interest in providing high-level save
and load
functionality that decouples the run-time model mechanics a bit more from its serialization and focuses more on longevity?
Issue Analytics
- State:
- Created 6 years ago
- Reactions:11
- Comments:11 (11 by maintainers)
Yes, it would be good to have this noted in the FAQ.
A few points:
__setstate__
to help load old models in new versions, or otherwise just announce the explicit lack of compatibility. But it would be hard to test and give assurances for all parameter combinations or all training data.We have updated the docs since with a Interoperable formats section and there is a “Model export for production” section in Related projects. With that in mind, I agree to close.