question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API split "model" parameters from "config" params.

See original GitHub issue

We can think of our estimators’ parameters as two different types:

  • model hyper-parameters which influence the end result, accuracy, complexity, etc., e.g. SVM’s nu or the number of trees in a random forest.
  • config parameters which control other aspects of an estimator such as verbosity, copy_X, return_xyz, etc.

We could split these two types of parameters and have model hyperparams as __init__ args and the rest as set_params or a new configure args. This would also allow us to introduce new configuration parameters to all estimators w/o having to add another arg to __init__. An example would be the routing parameter required for sample props implementation, or potentially controlling log level at the estimator level.

My proposal would be to have a config_params estimator tag which includes the names of those parameters, and have clone, get_params and set_params understand them, and potentially introduce a new configure method to set those parameters.

This also would mean partially relaxing the set only __init__ params in __init__ requirement since __init__ would be allowed to set those extra params.

cc: @scikit-learn/core-devs

also related to: https://github.com/scikit-learn/scikit-learn/pull/17441#discussion_r435447166

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

3reactions
mitarcommented, Jul 2, 2020

Thank you for opening this issue. I can share a bit of our experience in the D3M program where we are working on AutoML tools, including use of sklearn for AutoML. I think distinguishing model and config parameters is useful, but as already noted in other comments, it is hard to draw a line and some parameters are both. To address all that, we allow parameters to belong to more than one category. Moreover, we in fact have multiple categories of parameters:

  • Control parameters (what you call config here). For example, “drop column with index X”. You do not want to tune that. So things which influence the logic of the program.
  • Tuning parameters (one which influence the score, but not the logic).
  • Resources use parameters (n_jobs for example).

We have few more, but they specific to our needs (like metafeature parameter, which controls which metafeatures do we compute while running the program).

And then each parameter has a list of categories the author of the model provides. And I think this is important point/realization here. It is OK if things are hard to define. What this just tells what author of the model thinks is the main category or categories this parameter will be used as. It does not have to be perfect. User can ignore it. Or determine their own interpretation. But it is useful also to capture author’s intention/assumption.

Which leads to my last point, that personally, i would keep all of them inside __init__, but use other means to mark them with this metadata. This way it is not too critical if categorization is wrong, you do not have to change the API. It is just metadata. If it is wrong, you can fix it in the next version. But moving those parameters to another method call makes this impossible.

So all for having this extra metadata, even if imperfect. But make it optional. Maybe it can just be rendered in documentation as tags next to the parameter name.

1reaction
jnothmancommented, Jun 14, 2020

Distinguishing between those parameters that may impact a model’s predictions from those that can’t can be valuable to users who don’t really know where to start.

This is a documentation issue, not an API issue. Now that we are using kwargs, I think it would be good to change numpydoc so that a project can have custom “* Parameters” sections. The “Parameters” section made sense when everything was positional.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How should I pass multiple parameters to an ASP.Net Web ...
I think the easiest way is to simply use AttributeRouting . It's obvious within your controller, why would you want this in your...
Read more >
Split Evaluator - Split Help Center
This parameter controls how often this data is sent to Split servers. The parameter should be in seconds. The SDK sends tracked events...
Read more >
Parameter Binding in ASP.NET Web API - ASP.NET 4.x
Describes how Web API binds parameters and how to customize the binding process in ASP.NET 4.x.
Read more >
Examples of using parameters within additionalParams for ...
Specifying a default split rate for model-training configuration ... In the following example, the split_rate parameter sets the default split rate for model...
Read more >
Python API Reference — xgboost 1.7.2 documentation
Booster is the model of xgboost, that contains low level routines for training, prediction and evaluation. Parameters. params (dict) – Parameters for boosters....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found