Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Crash under specific input example

See original GitHub issue

Describe the bug

I was trying to create a minimal working example for an issue we have on real data (KDDCup). Along the way I found this (different) error raised when producing predictions.

I’m fine with a won't fix but I figured I would share so you can see if it has a more serious underlying issue.

To Reproduce

Installed from development branch.

import numpy as np
from autosklearn.experimental.askl2 import AutoSklearn2Classifier

x = np.random.random(size=(150, 4))
y = np.asarray([1]*75 + [2]*74 + [3])

aml = AutoSklearn2Classifier(time_left_for_this_task=60)
aml.fit(x, y)
predictions = aml.predict(x)

The single sample for class 3 seems rather crucial, I tried other configurations but they would not produce the error.

Expected behavior

Predictions to be produced.

Actual behavior, stacktrace or logfile

(venv) root@486c0ae472af:/bench# python mwe.py
/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/smac/intensification/parallel_scheduling.py:152: UserWarning: SuccessiveHalving is intended to be used with more than 1 worker but num_workers=1
  num_workers
[WARNING] [2021-07-27 15:07:04,115:Client-EnsembleBuilder] No models better than random - using Dummy loss!Number of models besides current dummy model: 1. Number of dummy models: 1
Traceback (most recent call last):
  File "mwe.py", line 9, in <module>
    predictions = aml.predict(x)
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/estimators.py", line 695, in predict
    return super().predict(X, batch_size=batch_size, n_jobs=n_jobs)
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/estimators.py", line 494, in predict
    return self.automl_.predict(X, batch_size=batch_size, n_jobs=n_jobs)
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/automl.py", line 1703, in predict
    n_jobs=n_jobs)
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/automl.py", line 1230, in predict
    for identifier in self.ensemble_.get_selected_model_identifiers()
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/automl.py", line 96, in _model_predict
    prediction = model.predict_proba(X_)
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/ensemble/_voting.py", line 329, in _predict_proba
    avg = np.average(self._collect_probas(X), axis=0,
  File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/ensemble/_voting.py", line 324, in _collect_probas
    return np.asarray([clf.predict_proba(X) for clf in self.estimators_])
ValueError: could not broadcast input array from shape (150,3) into shape (150,)

Environment and installation:

Please give details about your installation:

OS: Debian 10 in docker hosted by Windows 10
virtual environment
Python version: 3.7.11
Auto-sklearn version: development (11afae22b8c9a6309d2b6fcf7cfb9a947711cd1e)

Issue Analytics

State:
Created 2 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

eddiebergmancommented, Sep 3, 2021

I meant that np.asarray and np.array should be identical, as far as I know, np.asarray is just a wrapper around np.array with some extra functionality for extra kinds of types.

As for how the different shapes come about, I imagine it’s something specific to asklearn2 but its just a gut feeling, I would have to investigate it properly. If the pipeline is different from the normal AutoML class then I might guess it’s related to the issue fixed in #1218 but it’s just a guess. No point guessing until it’s looked into.

0reactions

mfeurercommented, Sep 14, 2021

Fixed via #1218 and #1245. @eddiebergman could you please open a new issue if you find that scikit-learn warning again?