CategoricalEncoder in preprocessing or feature_extraction module?
See original GitHub issueIn the merged PR, the CategoricalEncoder was put in the preprocessing
module. However, it might make more sense to put it in feature_extraction
since it is dealing with extracting numerical features from categorical data (eg also DictVectorizer
is in that module).
I think the main reason it is in preprocessing
is because OneHotEncoder
is historically there.
But since it’s not yet released we can still change this, if we would want to.
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
sklearn.preprocessing.OneHotEncoder
Encode categorical features as a one-hot numeric array. ... By default, the encoder derives the categories based on the unique values in each...
Read more >feature_engine Documentation - Read the Docs
Feature-engine is a Python library with multiple transformers to engineer and select features to use in machine learning models.
Read more >Categorical Encoder in Scikit Learn Preprocessing
I have version 0.21.3 of sklearn. I checked online to see the documentation and it seems that CategoricalEncoder was there in version 0.20.dev0 ......
Read more >scikit-learn : Data Preprocessing I - Missing/categorical data
scikit-learn : Data Preprocessing (missing/categorical data) ... scikit-learn : Features and feature extraction - iris dataset
Read more >Feature-engine — 1.5.2
Variable selection. Datetime features. Time series. Preprocessing. Feature-engine allows you to select the variables you want to transform ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think this can be closed, regardless of whether there is clear consensus.
Let me join your killing spree…
It is of course not exact by date, but you always look at https://github.com/scikit-learn/scikit-learn/pulse. Anyhow, the balance for this week is -40 issues 😃