Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoMLSearch: calling search twice on same instance doesn't work

See original GitHub issue

Not sure if this is intended behavior, but when I call automl.search(X, y) on the same automl object, the second search will run the baseline round and then quit because the number of iterations starts at 6 (or 1+num of iterations from first search):

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

dsherrycommented, Dec 16, 2020

@angela97lin @freddyaboulton thanks for filing and great discussion. I agree!

As you both alluded to, we’ve been talking about this bottom-up, i.e. “given our current API what should the behavior be?”, but we should also consider this top-down, i.e. “what are things users want to do with automl search?” I think that if we decide we want to support behavior like pausing and resuming searches, we should consider building a different API for that ( #1047 ) before we invest time into updating AutoMLSearch further.

With that in mind, options for what should happen when we call search again on an AutoMLSearch instance after the first call (not necessarily mutually exclusive):

Current buggy behavior
Error: “running search more than once on an AutoMLSearch instance is not allowed”
No-op: nothing happens.
Rerun entire search from scratch. All state which was created during previous calls search gets overwritten
“Continue” or “resume” the search from where it left off

I agree that option 0 (current behavior) is buggy and we should change it. For the time being I like option 2 or 3 the most. I think we should go for option 3 for now, and if that feels too complicated to build, we can fall back to option 2. Long-term I actually like option 4 (continuing) the most but I think we should punt on that for now.

1reaction

freddyaboultoncommented, Dec 14, 2020

Great point @angela97lin ! I think our current AutoMLSearch design lends itself to the “one object per search” paradigm you mention. My reasoning is that a lot of the configuration parameters the user specifies when creating the search are specific to the problem/dataset at hand (problem_type, allowed_pipelines, objective). It’s possible that the values for these parameters are reasonable for one kind of X, y but not any other given X, y of the same problem type. Moreover, the rankings table will be misleading if we let the user call search on separate datasets since the cv scores will not be directly comparable.

So I think if we want to follow what we have so far, we should do the “no-op” solution: make sure no pipelines are scored on subsequent calls to search if the stopping criteria has been met. I think we should only recalculate the baselines if we change our design to allow for reusing the same search object.

Happy to talk about whether we should refactor/redesign AutoMLSearch to allow calling search multiple times. I don’t have a strong opinion yet of what would be most useful for our end users!