question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams

See original GitHub issue

Is your feature request related to a problem? Please describe. Hyperparam names are too long in nested steps

Describe the solution you’d like A way to compress the names so as to make them shorter. More specifically, I think that an automated algorithm for all existing ML pipelines could be built. That would be to do something like:

all_hps = pipeline.get_hyperparams()
all_hps_shortened = all_hps.compress()
pprint(all_hps_shortened)

Then we’d see something like this in the pprint:

{
    "*__MetaStep__*__SKLearnWrapper_LinearRegression__C": 1000,
    "*__SomeStep__hyperparam3": value,
    "*__SKLearnWrapper_BoostedTrees__count": 10
}

That is, the unique paths to some steps were compressed using the star (*) operator. The Star operator means “one or more steps between”. But the way the paths are compressed would be lossless, in the sense that the original names could ALWAYS be retrieved given the original pipeline’s tree structure.

Describe alternatives you’ve considered Using custom ways to flush words and compress them. That seems good, but it doesn’t seem to generalize to all pipelines that could exist.

Additional context Hyperparameter names were said to be too long as well in #478

Additional idea For hyperparameters, given the fact that in the future every model may need to name its expected hyperparams, then it may be possible to use their name only and directly if there is no other step with the same hyperparams. If another step uses the same hyperparam names, then compression with the “*” could go up in the tree to find the first non-common parent name or something.

More ideas are needed to be sure we do this the right way.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
guillaume-chevaliercommented, Jun 26, 2021

Would be interesting to have this as well: CompressedHyperparameterSamples.restore() -> HyperparameterSamples

0reactions
guillaume-chevaliercommented, Jun 15, 2022

Completed using the “use_wildcards” argument such as in RecursiveDict.to_flat_dict(use_wildcards=True)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Recursive Function to Record Key path in a Python Dictionary
You could make it a recursive generator. Go throuth keys and values of the dictionary and recurse for values that are dictionaries. just ......
Read more >
Step 2: Launch a SageMaker Distributed Training Job Using ...
Learn how to run a distributed data parallel training job using the SageMaker Python SDK and your adapted training script with SageMaker's distributed...
Read more >
Source code documentation - OMFIT
Recursive update of dictionary A based on data from dictionary B ... preentry – function to pre-process the data at the OMFIT location...
Read more >
Large Text Compression Benchmark - Matt Mahoney
Alg: compression algorithm, referring to the method of parsing the input into symbols (strings, bytes, or bits) and estimating their probabilities (modeling) ...
Read more >
Top 170 Machine Learning Interview Questions 2023
Here is the list of the top 170 Machine Learning Interview Questions and Answers that will help you prepare for your next interview....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found