Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC introduce methods to get and set estimators' state

See original GitHub issue

Right now clone uses {get, set}_params to replicate an unfit estimator. These methods are designed to return esimators’ hyperparameters. At the moment, we have no way of getting the state of a fitted estimator in a non-pickle format.

Pickle files are by design able to run arbitrary code, and therefore one should ideally only load a pickle file from a trusted source. This makes sharing and moving scikit-learn based estimators hard, which also introduces security issues when deploying ML models in production.

Another issue with pickle files is that we kinda force people to use the same versions of the libraries they used to train the model and dump the pickle. This prevents people from being able to update their base docker images when they’re deploying a model which was trained a while ago, and I’m not sure if we have good ways of letting them update their pickle files for a new version.

My proposal is to introduce {get, set}_state methods on the BaseEstimator level be able to persist and set the state of models in a more portable, secure, and backward compatible way. We can probably even just do JSON.

cc @scikit-learn/core-devs @koaning

Issue Analytics

State:
Created 2 years ago
Reactions:4
Comments:17 (15 by maintainers)

Top GitHub Comments

2reactions

NicolasHugcommented, Mar 11, 2022

I share @thomasjpfan 's concerns about support for backward-compatibility.

Also, (and I’m not good at asking subtle questions): is HuggingFace planning on building a scikit-learn model hub?

1reaction

ogriselcommented, Mar 14, 2022

Another use case for sklearn-to-sklearn without vulnerability to arbitrary code injection a-la pickle would be to make it possible to host a public model auditing service where you would upload a trained scikit-learn pipeline and be able to run any Python based auditing tools based on either scikit-learn’s own inspection tools or third-party scikit-learn compatible tools such as SHAP, FACET, ELI5, interpretml, fairlearn and so on.

Top Results From Across the Web

RFC 7014: Flow Selection Techniques

Therefore, configuration and reporting considerations for Flow-state dependent packet selection techniques have been included in this document. 1.1.

RFC 3261 SIP: Session Initiation Protocol - IETF

The most important method in SIP is the INVITE method, which is used to establish a session between participants. A session is a...

Transmission Control Protocol - Wikipedia

The server must be listening (passive open) for connection requests from clients before a connection is established. Three-way handshake (active open), ...

sklearn.feature_selection.RFE

First, the estimator is trained on the initial set of features and the importance of ... The following example shows how to retrieve...

Requests for comments (RFCs) - Sourcegraph handbook

RFCs give us a way to write down ideas and plans so we can communicate, ... We have a set of status labels...