API split "model" parameters from "config" params.
See original GitHub issueWe can think of our estimators’ parameters as two different types:
- model hyper-parameters which influence the end result, accuracy, complexity, etc., e.g. SVM’s
nu
or the number of trees in a random forest. - config parameters which control other aspects of an estimator such as verbosity, copy_X, return_xyz, etc.
We could split these two types of parameters and have model hyperparams as __init__
args and the rest as set_params
or a new configure
args. This would also allow us to introduce new configuration parameters to all estimators w/o having to add another arg to __init__
. An example would be the routing parameter required for sample props implementation, or potentially controlling log level at the estimator level.
My proposal would be to have a config_params
estimator tag which includes the names of those parameters, and have clone
, get_params
and set_params
understand them, and potentially introduce a new configure
method to set those parameters.
This also would mean partially relaxing the set only __init__
params in __init__
requirement since __init__
would be allowed to set those extra params.
also related to: https://github.com/scikit-learn/scikit-learn/pull/17441#discussion_r435447166
Issue Analytics
- State:
- Created 3 years ago
- Comments:16 (16 by maintainers)
Top GitHub Comments
Thank you for opening this issue. I can share a bit of our experience in the D3M program where we are working on AutoML tools, including use of sklearn for AutoML. I think distinguishing model and config parameters is useful, but as already noted in other comments, it is hard to draw a line and some parameters are both. To address all that, we allow parameters to belong to more than one category. Moreover, we in fact have multiple categories of parameters:
n_jobs
for example).We have few more, but they specific to our needs (like metafeature parameter, which controls which metafeatures do we compute while running the program).
And then each parameter has a list of categories the author of the model provides. And I think this is important point/realization here. It is OK if things are hard to define. What this just tells what author of the model thinks is the main category or categories this parameter will be used as. It does not have to be perfect. User can ignore it. Or determine their own interpretation. But it is useful also to capture author’s intention/assumption.
Which leads to my last point, that personally, i would keep all of them inside
__init__
, but use other means to mark them with this metadata. This way it is not too critical if categorization is wrong, you do not have to change the API. It is just metadata. If it is wrong, you can fix it in the next version. But moving those parameters to another method call makes this impossible.So all for having this extra metadata, even if imperfect. But make it optional. Maybe it can just be rendered in documentation as tags next to the parameter name.
This is a documentation issue, not an API issue. Now that we are using kwargs, I think it would be good to change numpydoc so that a project can have custom “* Parameters” sections. The “Parameters” section made sense when everything was positional.