question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] How to get the number of runs executed in SMAC for each pipeline?

See original GitHub issue

I can use runcount_limit to limit the number of runs in SMAC for each pipeline.

automl = AutoSklearnClassifier( smac_scenario_args={'runcount_limit': 1000}, )

Is it possible to get the number of executed runs in SMAC for each pipeline?

Or am I misunderstanding the meaning of runcount_limit?

Any comments are highly appreciated.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
eddiebergmancommented, May 11, 2022

These initial configurations are just meta-learned configurations that we use to provide some initial points SMAC should evaluate to get some information for it surrogate model. This provides no constraint on the actual running of autosklearn though, just a set of initial configurations to try.

For your second question, part b) no there is not but that would be a useful feature and I have added that as an issue #1470.

The second question part a), you can use the dataframe provided by leaderboard() to extract all the information you need. However it would be good to have more of a handy solution to this!

For any follow up quesions, please create a new issue 😃

1reaction
eddiebergmancommented, May 10, 2022

Hi @jmren168,

So the main categories, are holdout and cv with each having a flavour of iterative which is limits the available configurations to a subset of algorithms, those supporting an iterative method of fitting. Unless you have specified something else, the default is "holdout", check out resampling-strategy. I think #428 was a misunderstanding, we do not have that feature. To run a single pipeline more than once is to use a "cv" resampling strategy.

  1. In the case of a non-iterative flavour of holdout resampling strategy, the default, each pipeline gets evaluated once by SMAC.

  2. In the case of a non-iterative flavour of cv resampling strategy, each of them will be evaluated with a certain amount of folds as is done with normal cross validation. The pipeline will be trained on different folds and the resulting pipelines will be accumulated together with something like a VotingClassifer or VotingRegressor of sklearn. It will be evaluated by default with resampling_strategy_arguments = { "folds": 5 }.

  3. In the case of an "*iterative*" flavour of resampling-strategy, each pipeline can get evaluated more than once by SMAC. That is dependent upon SMAC and it’s scheduling for which @mfeurer might be able to give a better answer.

Considering you haven’t mentioned the “iterative” or “cv” resampling strategy, I will assume you are in case 1 and runcount_limit means how many pipelines are evaluated, each being evaluated once.

The performance_over_time_ data is related to the final ensemble built by autosklearn. This ensemble building is interleaved with pipeline evaluation. Eval a pipeline, build an ensemble, eval another pipeline, build an ensemble, eval a pipeline, … . The runcount_limit and number of ensembles built is not in one-to-one correspondence. This explains the difference between the Number of target algorithm runs and the single best optimization score.

You are also correct that num_run is essentially a config_id, this is perhaps something we should change to be more reflective of what it is.

I hope this helped answer some questions 😃

Best, Eddie

Read more comments on GitHub >

github_iconTop Results From Across the Web

Improve user method of seeing pipelines generated #1298
For context, the optimizer SMAC gives each "run" a number but for us, a "run" corresponds to a model configuration that is trained, ......
Read more >
smac.facade — SMAC3 Documentation 2.0.0a2 documentation
Generates a hash based on all components of the facade. This is used for the run name or to determine whether a run...
Read more >
Open vSwitch Advanced Features
Setup¶. To get started, start ovs-sandbox . Inside the interactive shell that it starts, run this command:.
Read more >
Specific, Methodical and Consistent (SMaC) | Club Solutions
This concept was learned from a book called, “4 Disciplines Of Execution.” Every week, each sales rep would have their one-on-one meeting with ......
Read more >
Auto-Sklearn for Automated Machine Learning in Python
The AutoSklearnClassifier is configured to run for 5 minutes with 8 cores and limit each model evaluation to 30 seconds.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found