Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoMLSearch: data splitting fails with IndexError

See original GitHub issue

Problem

Index Error when search is called

Reproducer:

Code

X = pd.read_csv('data.csv')
y = X.pop('TARGET')
automl = AutoMLSearch(problem_type='binary', objective='f1')
automl.search(X, y)

Data:

https://alteryx0-my.sharepoint.com/:u:/g/personal/ethan_tu_alteryx_com/EcgNVHHRUW1Gi9wcWXcOc3oBp_VDRhksoHopj5r8UWF1VA?e=vCCmhs

Stack Trace:

IndexError                                Traceback (most recent call last)
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
   2110         try:
-> 2111             return self.obj._take_with_is_copy(key, axis=axis)
   2112         except IndexError:

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/generic.py in _take_with_is_copy(self, indices, axis, **kwargs)
   3408         """
-> 3409         result = self.take(indices=indices, axis=axis, **kwargs)
   3410         # Maybe set copy if we didn't actually change the index.

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/generic.py in take(self, indices, axis, is_copy, **kwargs)
   3393 
-> 3394         new_data = self._data.take(
   3395             indices, axis=self._get_block_manager_axis(axis), verify=True

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/internals/managers.py in take(self, indexer, axis, verify, convert)
   1385         if convert:
-> 1386             indexer = maybe_convert_indices(indexer, n)
   1387 

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexers.py in maybe_convert_indices(indices, n)
    212     if mask.any():
--> 213         raise IndexError("indices are out-of-bounds")
    214     return indices

IndexError: indices are out-of-bounds

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-18-4575b463e46a> in <module>
----> 1 automl.search(X_train, y_train)
      2 pipeline = automl.best_pipeline
      3 pipeline.fit(X_train, y_train)
      4 
      5 scores = evalml.pipelines.calculate_permutation_importance(pipeline,

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in search(self, X, y, data_checks, feature_types, raise_errors, show_iteration_plot)
    365 
    366         start = time.time()
--> 367         self._add_baseline_pipelines(X, y, pbar, raise_errors=raise_errors)
    368 
    369         current_batch_pipelines = []

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in _add_baseline_pipelines(self, X, y, pbar, raise_errors)
    469         pbar.set_description_str(desc=desc, refresh=True)
    470 
--> 471         baseline_results = self._compute_cv_scores(baseline, X, y, raise_errors=raise_errors, pbar=pbar)
    472         self._add_result(trained_pipeline=baseline,
    473                          parameters=baseline.parameters,

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in _compute_cv_scores(self, pipeline, X, y, raise_errors, pbar)
    486         for train, test in self.data_split.split(X, y):
    487             if isinstance(X, pd.DataFrame):
--> 488                 X_train, X_test = X.iloc[train], X.iloc[test]
    489             else:
    490                 X_train, X_test = X[train], X[test]

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1766 
   1767             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1768             return self._getitem_axis(maybe_callable, axis=axis)
   1769 
   1770     def _is_scalar_access(self, key: Tuple):

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   2127         # a list of integers
   2128         elif is_list_like_indexer(key):
-> 2129             return self._get_list_axis(key, axis=axis)
   2130 
   2131         # a single integer

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
   2112         except IndexError:
   2113             # re-raise with different error message
-> 2114             raise IndexError("positional indexers are out-of-bounds")
   2115 
   2116     def _getitem_axis(self, key, axis: int):

IndexError: positional indexers are out-of-bounds