question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoMLSearch: calling search twice on same instance doesn't work

See original GitHub issue

Not sure if this is intended behavior, but when I call automl.search(X, y) on the same automl object, the second search will run the baseline round and then quit because the number of iterations starts at 6 (or 1+num of iterations from first search):

image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
dsherrycommented, Dec 16, 2020

@angela97lin @freddyaboulton thanks for filing and great discussion. I agree!

As you both alluded to, we’ve been talking about this bottom-up, i.e. “given our current API what should the behavior be?”, but we should also consider this top-down, i.e. “what are things users want to do with automl search?” I think that if we decide we want to support behavior like pausing and resuming searches, we should consider building a different API for that ( #1047 ) before we invest time into updating AutoMLSearch further.

With that in mind, options for what should happen when we call search again on an AutoMLSearch instance after the first call (not necessarily mutually exclusive):

  1. Current buggy behavior
  2. Error: “running search more than once on an AutoMLSearch instance is not allowed”
  3. No-op: nothing happens.
  4. Rerun entire search from scratch. All state which was created during previous calls search gets overwritten
  5. “Continue” or “resume” the search from where it left off

I agree that option 0 (current behavior) is buggy and we should change it. For the time being I like option 2 or 3 the most. I think we should go for option 3 for now, and if that feels too complicated to build, we can fall back to option 2. Long-term I actually like option 4 (continuing) the most but I think we should punt on that for now.

1reaction
freddyaboultoncommented, Dec 14, 2020

Great point @angela97lin ! I think our current AutoMLSearch design lends itself to the “one object per search” paradigm you mention. My reasoning is that a lot of the configuration parameters the user specifies when creating the search are specific to the problem/dataset at hand (problem_type, allowed_pipelines, objective). It’s possible that the values for these parameters are reasonable for one kind of X, y but not any other given X, y of the same problem type. Moreover, the rankings table will be misleading if we let the user call search on separate datasets since the cv scores will not be directly comparable.

So I think if we want to follow what we have so far, we should do the “no-op” solution: make sure no pipelines are scored on subsequent calls to search if the stopping criteria has been met. I think we should only recalculate the baselines if we change our design to allow for reusing the same search object.

Happy to talk about whether we should refactor/redesign AutoMLSearch to allow calling search multiple times. I don’t have a strong opinion yet of what would be most useful for our end users!

Read more comments on GitHub >

github_iconTop Results From Across the Web

PermutationFeatureImportance not working with AutoML API
Example code following Multiclassification PFI Implementation from ML.Net Documentation, using pipeline extracted from AutoML bestRun Model:.
Read more >
AutoML: Automatic Machine Learning - H2O.ai Documentation
In both the R and Python API, AutoML uses the same data-related arguments, ... H2O algorithms will be used if the search stopping...
Read more >
AutoML: Automatic Machine Learning — H2O 3.22.0.2 ... - AWS
In both the R and Python API, AutoML uses the same data-related ... grid search and the training of individual models within the...
Read more >
Troubleshooting online endpoints deployment and scoring
Learn how to troubleshoot some common deployment and scoring errors with online endpoints.
Read more >
Auto-Sklearn for Automated Machine Learning in Python
Auto-Sklearn is an open-source library for performing AutoML in Python. It makes use of the popular Scikit-Learn machine learning library ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found