Pipeline predict should preserve the data index
See original GitHub issueCurrently, calling pipeline.predict(X)
will return a series with an index with values from 0 to X.shape[0] - 1
. This has can cause bugs (#1634) and can be confusing to users if they pass in data with a custom index.
Should we change this behavior?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (4 by maintainers)
Top Results From Across the Web
How to Use Sklearn Pipelines For Ridiculously Neat Code
Learn how to leverage Sklearn pipelines for compact, easy and awesome code. ... We will use the final X_test set for predictions.
Read more >6.1. Pipelines and composite estimators - Scikit-learn
You only have to call fit and predict once on your data to fit a whole sequence of estimators. ... The estimators of...
Read more >Understanding Pipeline Predict Scores - Help Center
The Pipeline Predict Score tells you how likely it is that an account will become a pipeline opportunity. This insight tells you which......
Read more >How Scikit-learn Pipelines Make Your Life So Much Easier
First, you need to fit it to the training data; Call the .score( ) method, .predict( ), etc. Access attributes and methods of...
Read more >Pipelines - Hugging Face
from transformers import pipeline pipe = pipeline("text-classification") def data(): while True: # This could come from a dataset, a database, a queue or ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can pick this up! @freddyaboulton Do you think it’s still worth discussing first, or should we just go for it? Are there reasons why we wouldn’t want to do this?
Go for it @angela97lin ! I don’t foresee any trouble in doing this (beyond some breaking
pd.testing_assert_series_equal
that rely on the current index behavior), but I think unit tests + performance tests should give us a better idea.