Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[QST] Updating indices creation for split

See original GitHub issue

I am interested in using GridSearchCV machinery from sklearn with non-numpy libraries like cuDF or dask.array. In exploring this work, I think functions like _iter_test_indices and _iter_test_masks could be amended slightly to return slices instead of numpy arrays.

For example in _iter_test_indices: https://github.com/scikit-learn/scikit-learn/blob/ec35ed226ca104e841238f2fac24269c2f9f2730/sklearn/model_selection/_split.py#L423-L436

If there is a shuffle, much of the code could be encapsulated in the shuffle conditional. And when there is no shuffle, we can yield the appropriate slice object: slice(0, 71)…

I wanted to check here before proceeding with the work and/or if folks have any advice before I begin?

Issue Analytics

State:
Created 4 years ago
Comments:18 (15 by maintainers)

Top GitHub Comments

1reaction

amuellercommented, Aug 21, 2019

we kinda only need to distinguish it from an integer or None, right? We could check for a single common method existing. If we then try to use some obscure method and it’s not implemented, well that’s a reasonable error. If it has a uniform method it’s much more likely to be something that implements sampling than an integer, I think 😉

0reactions

ogriselcommented, May 13, 2022

It would be great if someone wanted to try to prototype Array API spec support in scikit-learn Cross-Validation tools in a draft PR on top of the #22554 PR.