Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams
See original GitHub issueIs your feature request related to a problem? Please describe. Hyperparam names are too long in nested steps
Describe the solution you’d like A way to compress the names so as to make them shorter. More specifically, I think that an automated algorithm for all existing ML pipelines could be built. That would be to do something like:
all_hps = pipeline.get_hyperparams()
all_hps_shortened = all_hps.compress()
pprint(all_hps_shortened)
Then we’d see something like this in the pprint:
{
"*__MetaStep__*__SKLearnWrapper_LinearRegression__C": 1000,
"*__SomeStep__hyperparam3": value,
"*__SKLearnWrapper_BoostedTrees__count": 10
}
That is, the unique paths to some steps were compressed using the star (*) operator. The Star operator means “one or more steps between”. But the way the paths are compressed would be lossless, in the sense that the original names could ALWAYS be retrieved given the original pipeline’s tree structure.
Describe alternatives you’ve considered Using custom ways to flush words and compress them. That seems good, but it doesn’t seem to generalize to all pipelines that could exist.
Additional context Hyperparameter names were said to be too long as well in #478
Additional idea For hyperparameters, given the fact that in the future every model may need to name its expected hyperparams, then it may be possible to use their name only and directly if there is no other step with the same hyperparams. If another step uses the same hyperparam names, then compression with the “*” could go up in the tree to find the first non-common parent name or something.
More ideas are needed to be sure we do this the right way.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Would be interesting to have this as well:
CompressedHyperparameterSamples.restore() -> HyperparameterSamples
Completed using the “use_wildcards” argument such as in
RecursiveDict.to_flat_dict(use_wildcards=True)