[BUG] ColumnEnsembleClassifier fails fitting KNN classifiers
See original GitHub issueDescribe the bug I am trying to classify a multivariate dataset using a ColumnEnsembleClassifier. I use the same classifier for all dimensions. It fails to fit the dataset when I use KNN classifiers, but works when I use TimeSeriesForestClassifier.
I tried KNN with different metrics (‘msm, ‘dtw’ ‘euclidean’ -> imported from scipy’), all of them fail I tried it with AtrialFibrillation and BasicMotions datasets, it fails for both.
To Reproduce
from sktime.utils.data_io import load_from_arff_to_dataframe as load_arff
X_train, y_train = load_arff(path_to_dataset_TRAIN.arff')
from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.distance_based._time_series_neighbors import KNeighborsTimeSeriesClassifier
from sktime.classification.compose import TimeSeriesForestClassifier
########## This one works fine ##########
clf= ColumnEnsembleClassifier(estimators=[
('TSF_0',TimeSeriesForestClassifier(verbose=0,n_jobs=-1),[0]),
('TSF_1',TimeSeriesForestClassifier(verbose=0,n_jobs=-1),[1])
])
clf.fit(X_train, y_train)
ColumnEnsembleClassifier(estimators=[
('TSF_0',TimeSeriesForestClassifier(n_jobs=-1),[0]),
('TSF_1',TimeSeriesForestClassifier(n_jobs=-1),[1])
])
########## This one fails ##########
clf= ColumnEnsembleClassifier(estimators=[
('1NN-MSM_0', KNeighborsTimeSeriesClassifier(metric='msm'), [0]),
('1NN-MSM_1', KNeighborsTimeSeriesClassifier(metric='msm'), [1])
])
clf.fit(X_train, y_train)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\Desktop\OVGU\DKE Subjects\Master Thesis\Code\run.py in <module>
----> 1 clf.fit(X_train, y_train)
~\.virtualenvs\Code-OH4xsw-D\lib\site-packages\sktime\classification\compose\_column_ensemble.py in fit(self, X, y)
157 for name, estimator, column in self._iter(replace_strings=True):
158 estimator = clone(estimator)
--> 159 estimator.fit(_get_column(X, column), transformed_y)
160 estimators_.append((name, estimator, column))
161
~\.virtualenvs\Code-OH4xsw-D\lib\site-packages\sktime\classification\distance_based\_time_series_neighbors.py in fit(self, X, y)
243 check_array.__code__ = _check_array_ts.__code__
244
--> 245 fx = self._fit(X)
246
247 if hasattr(check_array, "__wrapped__"):
~\.virtualenvs\Code-OH4xsw-D\lib\site-packages\sklearn\neighbors\_base.py in _fit(self, X, y)
362 if not isinstance(X, (KDTree, BallTree, NeighborsBase)):
363 X, y = self._validate_data(X, y, accept_sparse="csr",
--> 364 multi_output=True)
365
366 if is_classifier(self):
~\.virtualenvs\Code-OH4xsw-D\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
413 if self._get_tags()['requires_y']:
414 raise ValueError(
--> 415 f"This {self.__class__.__name__} estimator "
416 f"requires y to be passed, but the target y is None."
417 )
ValueError: This KNeighborsTimeSeriesClassifier estimator requires y to be passed, but the target y is None.
Expected behavior ColumnEnsembleClassifier should fit the training data
Versions
System: python: 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] executable: C: …\…\.virtualenvs\Code-OH4xsw-D\Scripts\python.exe machine: Windows-10-10.0.18362-SP0
Python dependencies: pip: 20.3.1 setuptools: 50.3.2 sklearn: 0.24.0 sktime: 0.5.0 statsmodels: 0.12.1 numpy: 1.19.4 scipy: 1.5.4 Cython: 0.29.21 pandas: 1.1.5 matplotlib: 3.3.3 joblib: 1.0.0 numba: 0.52.0 pmdarima: None tsfresh: 0.17.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Unfortunately It doesn’t, but instead of
I use now
and the warning doesn’t show anymore Also now
Elastic Ensemble
works fineI found a very dirty solution to help make the code run. Yet it still shows a warning when I use
RandomizedSearchCV
with KNN using msm distance,ColumnEnsembler
of KNN classifiers using msm distance orElasticEnsemble
.The Dirty Solution Changed line 245 in file sktime\classification\distance_based_time_series_neighbors.py From: fx = self._fit(X) To: fx = self._fit(X,y)
The Warning I get now is:
site-packages\sktime\classification\distance_based\_time_series_neighbors.py:245: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). fx = self._fit(X,y)
But the code runs through !