Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

longevity of models

See original GitHub issue

Training models might take considerable amounts of time and energy, but with every upgrade of sklearn one currently might lose the ability to use a model trained with an older release version.

While this is mentioned in the docs, i still find it upsetting, as i can either pin sklearn to some old version or re-train all models and use an up-to-date sklearn. Re-training also might make it necessary to keep large amounts of training data around and ship them to all machines, potentially with parameters found by grid-search on previous versions to at least save grid-search times.

For example, i have a GradientBoostingClassifier model from 0.18.1 that now causes this userwarning:

Trying to unpickle estimator GradientBoostingClassifier from version 0.18.1 when using version 0.19.1. This might lead to breaking code or invalid results. Use at your own risk.

When trying to actually use the model the result is:

AttributeError: 'GradientBoostingClassifier' object has no attribute 'n_features_'

From my guess, this is mainly an issue of using pickle on the model object (which is currently the recommended way to serialize models): every rename of model properties in the source will cause a whole lot of trouble when loading a model object pre-dating that change, as the methods won’t find the properties anymore.

At the same time i think that the underlying idea of an estimator changes rather rarely.

Would there be any interest in providing high-level save and load functionality that decouples the run-time model mechanics a bit more from its serialization and focuses more on longevity?

Issue Analytics

State:
Created 6 years ago
Reactions:11
Comments:11 (11 by maintainers)

Top GitHub Comments

1reaction

jnothmancommented, Dec 15, 2017

Yes, it would be good to have this noted in the FAQ.

A few points:

Everyone wants this, but no one has volunteered a way to make it maintainable.
Any serialization/deserialization engine will need to be as flexible and insecure as pickle to cover all cases. So it is not worth the engineering.
If we had designed estimators and models as separate classes, it would have helped a lot with this issue. But we did not, and may have usability benefits as a result. Either way, that ship has sailed and we’re not going to change it.
Deserialising to fit another model: This should usually work unless there are API changes (deprecations) and what is pickled is a clone of the estimator. However, users really must record the version of scikit-learn they built a model/pickle with in order to recover all they need.
Deserialising to predict:
1. PMML is good if your model is simple enough to be converted. Encouraging custom estimators sort of contradicts requiring the use of PMML.
2. In general one may needs both constructor parameters and estimated attributes to predict, so just saving estimated attributes is not sufficient.
3. We could build a testing framework in which we checked whether a model fitted at the last release could be used to predict with the same results at master. We could consider implementing __setstate__ to help load old models in new versions, or otherwise just announce the explicit lack of compatibility. But it would be hard to test and give assurances for all parameter combinations or all training data.

0reactions

thomasjpfancommented, Jan 6, 2022

We have updated the docs since with a Interoperable formats section and there is a “Model export for production” section in Related projects. With that in mind, I agree to close.

Top Results From Across the Web

Human models of aging and longevity - PubMed

Objective: We discuss the phenotype of centenarians, the best example of successful aging, as well as other models exploited to study human aging...

Healthy longevity from incidence-based models

BACKGROUND. Healthy longevity (HL) is an important measure of the prospects for quality of life in age- ing societies. Incidence-based (cf.

Improving the Forecast of Longevity by Combining Models

In this article we propose a new methodological approach for improving the predictive accuracy of existing stochastic mortality models; we aim ...

Mortality Shock & Longevity Trend Risk Models & Data | RMS

The RMS Longevity Risk Model projects future mortality improvements by blending medical science and best-in-class statistical and actuarial techniques.

Forecasting Life Expectancy: A Statistical Look at Model ...

The model for the best-practice life expectancy says that (female) life expectancy at birth increases by 0.25 years every calendar year, but the ......