ColumnTransformer behavior for negative column indexes
See original GitHub issueDescription
The behavior of ColumnTransformer
when negative integers are passed as column indexes is not clear.
Steps/Code to Reproduce
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
X = np.random.randn(2, 2)
X_categories = np.array([[1], [2]])
X = np.concatenate([X, X_categories], axis=1)
print('---- With negative index ----')
ohe = OneHotEncoder(categories='auto')
tf_1 = ColumnTransformer([('ohe', ohe, [-1])], remainder='passthrough')
print(tf_1.fit_transform(X))
print('---- With positive index ----')
tf_2 = ColumnTransformer([('ohe', ohe, [2])], remainder='passthrough')
print(tf_2.fit_transform(X))
Expected Results
The first transformer tf_1
should either raise an error or give the same result as the second transformer tf_2
Actual Results
---- With negative index ----
[[ 1. 0. 0.10600662 -0.46707426 1. ]
[ 0. 1. -1.33177629 2.29186299 2. ]]
---- With positive index ----
[[ 1. 0. 0.10600662 -0.46707426]
[ 0. 1. -1.33177629 2.29186299]]
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
using ColumnTransformer for predicting values - Stack Overflow
I am currently using a column transformer for training and testing the model and it works perfect (code shown below):
Read more >sklearn.compose.ColumnTransformer
Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name.
Read more >Get column name after fitting the machine learning pipeline
Create ColumnTransformer to apply pipeline for each column typefrom sklearn.compose import ColumnTransformercol_trans = ColumnTransformer(transformers=[
Read more >Extracting, transforming and selecting features - Apache Spark
This is done using the hashing trick to map features to indices in the feature vector. The FeatureHasher transformer operates on multiple columns....
Read more >Source code for sklearn.compose._column_transformer
class ColumnTransformer(TransformerMixin, _BaseComposition): """Applies transformers to columns ... slice or callable Indexes the data on its second axis.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It is the validation of the remainder that is going wrong:
This is because the set operation here to get
remaining_idx
does not work with negative indices:https://github.com/scikit-learn/scikit-learn/blob/354c8c3bc3e36c69021713da66e7fa2f6cb07756/sklearn/compose/_column_transformer.py#L298-L304
Maybe we should convert the negative indices to positive ones in
_get_column_indices
?I think we should allow negative indices, if only because we are supporting various other numpy indexing syntaxes and users would expect it. Current behaviour doesn’t look so good!