AutoMLSearch: data splitting fails with IndexError
See original GitHub issueProblem
Index Error when search is called
Reproducer:
Code
X = pd.read_csv('data.csv')
y = X.pop('TARGET')
automl = AutoMLSearch(problem_type='binary', objective='f1')
automl.search(X, y)
Data:
Stack Trace:
IndexError Traceback (most recent call last)
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
2110 try:
-> 2111 return self.obj._take_with_is_copy(key, axis=axis)
2112 except IndexError:
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/generic.py in _take_with_is_copy(self, indices, axis, **kwargs)
3408 """
-> 3409 result = self.take(indices=indices, axis=axis, **kwargs)
3410 # Maybe set copy if we didn't actually change the index.
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/generic.py in take(self, indices, axis, is_copy, **kwargs)
3393
-> 3394 new_data = self._data.take(
3395 indices, axis=self._get_block_manager_axis(axis), verify=True
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/internals/managers.py in take(self, indexer, axis, verify, convert)
1385 if convert:
-> 1386 indexer = maybe_convert_indices(indexer, n)
1387
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexers.py in maybe_convert_indices(indices, n)
212 if mask.any():
--> 213 raise IndexError("indices are out-of-bounds")
214 return indices
IndexError: indices are out-of-bounds
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call last)
<ipython-input-18-4575b463e46a> in <module>
----> 1 automl.search(X_train, y_train)
2 pipeline = automl.best_pipeline
3 pipeline.fit(X_train, y_train)
4
5 scores = evalml.pipelines.calculate_permutation_importance(pipeline,
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in search(self, X, y, data_checks, feature_types, raise_errors, show_iteration_plot)
365
366 start = time.time()
--> 367 self._add_baseline_pipelines(X, y, pbar, raise_errors=raise_errors)
368
369 current_batch_pipelines = []
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in _add_baseline_pipelines(self, X, y, pbar, raise_errors)
469 pbar.set_description_str(desc=desc, refresh=True)
470
--> 471 baseline_results = self._compute_cv_scores(baseline, X, y, raise_errors=raise_errors, pbar=pbar)
472 self._add_result(trained_pipeline=baseline,
473 parameters=baseline.parameters,
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/evalml/automl/automl_search.py in _compute_cv_scores(self, pipeline, X, y, raise_errors, pbar)
486 for train, test in self.data_split.split(X, y):
487 if isinstance(X, pd.DataFrame):
--> 488 X_train, X_test = X.iloc[train], X.iloc[test]
489 else:
490 X_train, X_test = X[train], X[test]
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
1766
1767 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1768 return self._getitem_axis(maybe_callable, axis=axis)
1769
1770 def _is_scalar_access(self, key: Tuple):
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
2127 # a list of integers
2128 elif is_list_like_indexer(key):
-> 2129 return self._get_list_axis(key, axis=axis)
2130
2131 # a single integer
~/.pyenv/versions/feature_selection/lib/python3.8/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
2112 except IndexError:
2113 # re-raise with different error message
-> 2114 raise IndexError("positional indexers are out-of-bounds")
2115
2116 def _getitem_axis(self, key, axis: int):
IndexError: positional indexers are out-of-bounds
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
too many indices for array when creating a train test split ...
I am creating a Neural Network and currently I am working on the; train, test split but I am getting the error IndexError:...
Read more >train_test_split in dask_ml - IndexError: index is out of bounds
I need to: drop rows with errors (they have '0' in the first column of the output data);; drop last column;; split data...
Read more >Pandas index error - Data Science Stack Exchange
I am trying to use train_test_split to split my data. However, I am getting an index error. I pasted part of the error...
Read more >Release Notes — EvalML 0.64.0 documentation - Alteryx
Changed default value of data splitting for time series problem holdout ... Added holdout set evaluation as part of AutoML search and pipeline...
Read more >`IndexError: list index out of range` from lesson "Intro to AutoML"
I did provide a correct Project ID and Bucket Name, anything else I missed? full error message. ``` IndexError Traceback (most recent call...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks @christopherbunn . Yes, let’s move this into the custom index epic. I think the fix for that should fix both this and #1126 .
Filed a comment about the long conversion time in Woodwork 79!