question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoMLSearch: data splitting fails with IndexError

See original GitHub issue

Problem

Index Error when search is called

Reproducer:

Code

X = pd.read_csv('data.csv')
y = X.pop('TARGET')
automl = AutoMLSearch(problem_type='binary', objective='f1')
automl.search(X, y)

Data:

https://alteryx0-my.sharepoint.com/:u:/g/personal/ethan_tu_alteryx_com/EcgNVHHRUW1Gi9wcWXcOc3oBp_VDRhksoHopj5r8UWF1VA?e=vCCmhs

Stack Trace:


IndexError                                Traceback (most recent call last)
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
   2110         try:
-> 2111             return self.obj._take_with_is_copy(key, axis=axis)
   2112         except IndexError:

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/generic.py in _take_with_is_copy(self, indices, axis, **kwargs)
   3408         """
-> 3409         result = self.take(indices=indices, axis=axis, **kwargs)
   3410         # Maybe set copy if we didn't actually change the index.

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/generic.py in take(self, indices, axis, is_copy, **kwargs)
   3393 
-> 3394         new_data = self._data.take(
   3395             indices, axis=self._get_block_manager_axis(axis), verify=True

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/internals/managers.py in take(self, indexer, axis, verify, convert)
   1385         if convert:
-> 1386             indexer = maybe_convert_indices(indexer, n)
   1387 

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexers.py in maybe_convert_indices(indices, n)
    212     if mask.any():
--> 213         raise IndexError("indices are out-of-bounds")
    214     return indices

IndexError: indices are out-of-bounds

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-18-4575b463e46a> in <module>
----> 1 automl.search(X_train, y_train)
      2 pipeline = automl.best_pipeline
      3 pipeline.fit(X_train, y_train)
      4 
      5 scores = evalml.pipelines.calculate_permutation_importance(pipeline,

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in search(self, X, y, data_checks, feature_types, raise_errors, show_iteration_plot)
    365 
    366         start = time.time()
--> 367         self._add_baseline_pipelines(X, y, pbar, raise_errors=raise_errors)
    368 
    369         current_batch_pipelines = []

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in _add_baseline_pipelines(self, X, y, pbar, raise_errors)
    469         pbar.set_description_str(desc=desc, refresh=True)
    470 
--> 471         baseline_results = self._compute_cv_scores(baseline, X, y, raise_errors=raise_errors, pbar=pbar)
    472         self._add_result(trained_pipeline=baseline,
    473                          parameters=baseline.parameters,

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in _compute_cv_scores(self, pipeline, X, y, raise_errors, pbar)
    486         for train, test in self.data_split.split(X, y):
    487             if isinstance(X, pd.DataFrame):
--> 488                 X_train, X_test = X.iloc[train], X.iloc[test]
    489             else:
    490                 X_train, X_test = X[train], X[test]

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1766 
   1767             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1768             return self._getitem_axis(maybe_callable, axis=axis)
   1769 
   1770     def _is_scalar_access(self, key: Tuple):

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   2127         # a list of integers
   2128         elif is_list_like_indexer(key):
-> 2129             return self._get_list_axis(key, axis=axis)
   2130 
   2131         # a single integer

~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
   2112         except IndexError:
   2113             # re-raise with different error message
-> 2114             raise IndexError("positional indexers are out-of-bounds")
   2115 
   2116     def _getitem_axis(self, key, axis: int):

IndexError: positional indexers are out-of-bounds

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
dsherrycommented, Sep 9, 2020

Thanks @christopherbunn . Yes, let’s move this into the custom index epic. I think the fix for that should fix both this and #1126 .

0reactions
freddyaboultoncommented, Nov 5, 2020

Filed a comment about the long conversion time in Woodwork 79!

Read more comments on GitHub >

github_iconTop Results From Across the Web

too many indices for array when creating a train test split ...
I am creating a Neural Network and currently I am working on the; train, test split but I am getting the error IndexError:...
Read more >
train_test_split in dask_ml - IndexError: index is out of bounds
I need to: drop rows with errors (they have '0' in the first column of the output data);; drop last column;; split data...
Read more >
Pandas index error - Data Science Stack Exchange
I am trying to use train_test_split to split my data. However, I am getting an index error. I pasted part of the error...
Read more >
Release Notes — EvalML 0.64.0 documentation - Alteryx
Changed default value of data splitting for time series problem holdout ... Added holdout set evaluation as part of AutoML search and pipeline...
Read more >
`IndexError: list index out of range` from lesson "Intro to AutoML"
I did provide a correct Project ID and Bucket Name, anything else I missed? full error message. ``` IndexError Traceback (most recent call...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found