question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

longevity of models

See original GitHub issue

Training models might take considerable amounts of time and energy, but with every upgrade of sklearn one currently might lose the ability to use a model trained with an older release version.

While this is mentioned in the docs, i still find it upsetting, as i can either pin sklearn to some old version or re-train all models and use an up-to-date sklearn. Re-training also might make it necessary to keep large amounts of training data around and ship them to all machines, potentially with parameters found by grid-search on previous versions to at least save grid-search times.

For example, i have a GradientBoostingClassifier model from 0.18.1 that now causes this userwarning:

Trying to unpickle estimator GradientBoostingClassifier from version 0.18.1 when using version 0.19.1. This might lead to breaking code or invalid results. Use at your own risk.

When trying to actually use the model the result is:

AttributeError: 'GradientBoostingClassifier' object has no attribute 'n_features_'

From my guess, this is mainly an issue of using pickle on the model object (which is currently the recommended way to serialize models): every rename of model properties in the source will cause a whole lot of trouble when loading a model object pre-dating that change, as the methods won’t find the properties anymore.

At the same time i think that the underlying idea of an estimator changes rather rarely.

Would there be any interest in providing high-level save and load functionality that decouples the run-time model mechanics a bit more from its serialization and focuses more on longevity?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:11
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Dec 15, 2017

Yes, it would be good to have this noted in the FAQ.

A few points:

  1. Everyone wants this, but no one has volunteered a way to make it maintainable.
  2. Any serialization/deserialization engine will need to be as flexible and insecure as pickle to cover all cases. So it is not worth the engineering.
  3. If we had designed estimators and models as separate classes, it would have helped a lot with this issue. But we did not, and may have usability benefits as a result. Either way, that ship has sailed and we’re not going to change it.
  4. Deserialising to fit another model: This should usually work unless there are API changes (deprecations) and what is pickled is a clone of the estimator. However, users really must record the version of scikit-learn they built a model/pickle with in order to recover all they need.
  5. Deserialising to predict:
    1. PMML is good if your model is simple enough to be converted. Encouraging custom estimators sort of contradicts requiring the use of PMML.
    2. In general one may needs both constructor parameters and estimated attributes to predict, so just saving estimated attributes is not sufficient.
    3. We could build a testing framework in which we checked whether a model fitted at the last release could be used to predict with the same results at master. We could consider implementing __setstate__ to help load old models in new versions, or otherwise just announce the explicit lack of compatibility. But it would be hard to test and give assurances for all parameter combinations or all training data.
0reactions
thomasjpfancommented, Jan 6, 2022

We have updated the docs since with a Interoperable formats section and there is a “Model export for production” section in Related projects. With that in mind, I agree to close.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Human models of aging and longevity - PubMed
Objective: We discuss the phenotype of centenarians, the best example of successful aging, as well as other models exploited to study human aging...
Read more >
Healthy longevity from incidence-based models
BACKGROUND. Healthy longevity (HL) is an important measure of the prospects for quality of life in age- ing societies. Incidence-based (cf.
Read more >
Improving the Forecast of Longevity by Combining Models
In this article we propose a new methodological approach for improving the predictive accuracy of existing stochastic mortality models; we aim ...
Read more >
Mortality Shock & Longevity Trend Risk Models & Data | RMS
The RMS Longevity Risk Model projects future mortality improvements by blending medical science and best-in-class statistical and actuarial techniques.
Read more >
Forecasting Life Expectancy: A Statistical Look at Model ...
The model for the best-practice life expectancy says that (female) life expectancy at birth increases by 0.25 years every calendar year, but the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found