[Question] How to get the number of runs executed in SMAC for each pipeline?
See original GitHub issueI can use runcount_limit to limit the number of runs in SMAC for each pipeline.
automl = AutoSklearnClassifier( smac_scenario_args={'runcount_limit': 1000}, )
Is it possible to get the number of executed runs in SMAC for each pipeline?
Or am I misunderstanding the meaning of runcount_limit?
Any comments are highly appreciated.
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Improve user method of seeing pipelines generated #1298
For context, the optimizer SMAC gives each "run" a number but for us, a "run" corresponds to a model configuration that is trained, ......
Read more >smac.facade — SMAC3 Documentation 2.0.0a2 documentation
Generates a hash based on all components of the facade. This is used for the run name or to determine whether a run...
Read more >Open vSwitch Advanced Features
Setup¶. To get started, start ovs-sandbox . Inside the interactive shell that it starts, run this command:.
Read more >Specific, Methodical and Consistent (SMaC) | Club Solutions
This concept was learned from a book called, “4 Disciplines Of Execution.” Every week, each sales rep would have their one-on-one meeting with ......
Read more >Auto-Sklearn for Automated Machine Learning in Python
The AutoSklearnClassifier is configured to run for 5 minutes with 8 cores and limit each model evaluation to 30 seconds.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
These initial configurations are just meta-learned configurations that we use to provide some initial points SMAC should evaluate to get some information for it surrogate model. This provides no constraint on the actual running of autosklearn though, just a set of initial configurations to try.
For your second question, part b) no there is not but that would be a useful feature and I have added that as an issue #1470.
The second question part a), you can use the dataframe provided by
leaderboard()
to extract all the information you need. However it would be good to have more of a handy solution to this!For any follow up quesions, please create a new issue 😃
Hi @jmren168,
So the main categories, are holdout and cv with each having a flavour of iterative which is limits the available configurations to a subset of algorithms, those supporting an iterative method of fitting. Unless you have specified something else, the default is
"holdout"
, check outresampling-strategy
. I think #428 was a misunderstanding, we do not have that feature. To run a single pipeline more than once is to use a"cv"
resampling strategy.In the case of a non-iterative flavour of holdout resampling strategy, the default, each pipeline gets evaluated once by SMAC.
In the case of a non-iterative flavour of cv resampling strategy, each of them will be evaluated with a certain amount of folds as is done with normal cross validation. The pipeline will be trained on different folds and the resulting pipelines will be accumulated together with something like a
VotingClassifer
orVotingRegressor
of sklearn. It will be evaluated by default withresampling_strategy_arguments = { "folds": 5 }
.In the case of an
"*iterative*"
flavour ofresampling-strategy
, each pipeline can get evaluated more than once by SMAC. That is dependent upon SMAC and it’s scheduling for which @mfeurer might be able to give a better answer.Considering you haven’t mentioned the “iterative” or “cv” resampling strategy, I will assume you are in case 1 and
runcount_limit
means how many pipelines are evaluated, each being evaluated once.The
performance_over_time_
data is related to the final ensemble built by autosklearn. This ensemble building is interleaved with pipeline evaluation. Eval a pipeline, build an ensemble, eval another pipeline, build an ensemble, eval a pipeline, … . Theruncount_limit
and number of ensembles built is not in one-to-one correspondence. This explains the difference between theNumber of target algorithm runs
and thesingle best optimization score
.You are also correct that
num_run
is essentially aconfig_id
, this is perhaps something we should change to be more reflective of what it is.I hope this helped answer some questions 😃
Best, Eddie