AutoMLSearch: calling search twice on same instance doesn't work
See original GitHub issueNot sure if this is intended behavior, but when I call automl.search(X, y)
on the same automl object, the second search will run the baseline round and then quit because the number of iterations starts at 6 (or 1+num of iterations from first search):
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
PermutationFeatureImportance not working with AutoML API
Example code following Multiclassification PFI Implementation from ML.Net Documentation, using pipeline extracted from AutoML bestRun Model:.
Read more >AutoML: Automatic Machine Learning - H2O.ai Documentation
In both the R and Python API, AutoML uses the same data-related arguments, ... H2O algorithms will be used if the search stopping...
Read more >AutoML: Automatic Machine Learning — H2O 3.22.0.2 ... - AWS
In both the R and Python API, AutoML uses the same data-related ... grid search and the training of individual models within the...
Read more >Troubleshooting online endpoints deployment and scoring
Learn how to troubleshoot some common deployment and scoring errors with online endpoints.
Read more >Auto-Sklearn for Automated Machine Learning in Python
Auto-Sklearn is an open-source library for performing AutoML in Python. It makes use of the popular Scikit-Learn machine learning library ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@angela97lin @freddyaboulton thanks for filing and great discussion. I agree!
As you both alluded to, we’ve been talking about this bottom-up, i.e. “given our current API what should the behavior be?”, but we should also consider this top-down, i.e. “what are things users want to do with automl search?” I think that if we decide we want to support behavior like pausing and resuming searches, we should consider building a different API for that ( #1047 ) before we invest time into updating
AutoMLSearch
further.With that in mind, options for what should happen when we call
search
again on anAutoMLSearch
instance after the first call (not necessarily mutually exclusive):search
gets overwrittenI agree that option 0 (current behavior) is buggy and we should change it. For the time being I like option 2 or 3 the most. I think we should go for option 3 for now, and if that feels too complicated to build, we can fall back to option 2. Long-term I actually like option 4 (continuing) the most but I think we should punt on that for now.
Great point @angela97lin ! I think our current
AutoMLSearch
design lends itself to the “one object per search” paradigm you mention. My reasoning is that a lot of the configuration parameters the user specifies when creating the search are specific to the problem/dataset at hand (problem_type
,allowed_pipelines
,objective
). It’s possible that the values for these parameters are reasonable for one kind ofX
,y
but not any other givenX
,y
of the same problem type. Moreover, therankings
table will be misleading if we let the user call search on separate datasets since the cv scores will not be directly comparable.So I think if we want to follow what we have so far, we should do the “no-op” solution: make sure no pipelines are scored on subsequent calls to search if the stopping criteria has been met. I think we should only recalculate the baselines if we change our design to allow for reusing the same search object.
Happy to talk about whether we should refactor/redesign
AutoMLSearch
to allow callingsearch
multiple times. I don’t have a strong opinion yet of what would be most useful for our end users!