question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC introduce methods to get and set estimators' state

See original GitHub issue

Right now clone uses {get, set}_params to replicate an unfit estimator. These methods are designed to return esimators’ hyperparameters. At the moment, we have no way of getting the state of a fitted estimator in a non-pickle format.

Pickle files are by design able to run arbitrary code, and therefore one should ideally only load a pickle file from a trusted source. This makes sharing and moving scikit-learn based estimators hard, which also introduces security issues when deploying ML models in production.

Another issue with pickle files is that we kinda force people to use the same versions of the libraries they used to train the model and dump the pickle. This prevents people from being able to update their base docker images when they’re deploying a model which was trained a while ago, and I’m not sure if we have good ways of letting them update their pickle files for a new version.

My proposal is to introduce {get, set}_state methods on the BaseEstimator level be able to persist and set the state of models in a more portable, secure, and backward compatible way. We can probably even just do JSON.

cc @scikit-learn/core-devs @koaning

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:4
  • Comments:17 (15 by maintainers)

github_iconTop GitHub Comments

2reactions
NicolasHugcommented, Mar 11, 2022

I share @thomasjpfan 's concerns about support for backward-compatibility.

Also, (and I’m not good at asking subtle questions): is HuggingFace planning on building a scikit-learn model hub?

1reaction
ogriselcommented, Mar 14, 2022

Another use case for sklearn-to-sklearn without vulnerability to arbitrary code injection a-la pickle would be to make it possible to host a public model auditing service where you would upload a trained scikit-learn pipeline and be able to run any Python based auditing tools based on either scikit-learn’s own inspection tools or third-party scikit-learn compatible tools such as SHAP, FACET, ELI5, interpretml, fairlearn and so on.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RFC 7014: Flow Selection Techniques
Therefore, configuration and reporting considerations for Flow-state dependent packet selection techniques have been included in this document. 1.1.
Read more >
RFC 3261 SIP: Session Initiation Protocol - IETF
The most important method in SIP is the INVITE method, which is used to establish a session between participants. A session is a...
Read more >
Transmission Control Protocol - Wikipedia
The server must be listening (passive open) for connection requests from clients before a connection is established. Three-way handshake (active open), ...
Read more >
sklearn.feature_selection.RFE
First, the estimator is trained on the initial set of features and the importance of ... The following example shows how to retrieve...
Read more >
Requests for comments (RFCs) - Sourcegraph handbook
RFCs give us a way to write down ideas and plans so we can communicate, ... We have a set of status labels...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found