Add DateTimeTransformer?
See original GitHub issueDescribe the workflow you want to enable
Do we want to add a transformer for pandas datetimes?
We haven’t really added much based on pandas yet but this would be a pretty natural thing to add.
You could argue that you can do something like FunctionTransformer(lambda X: X.dt.dayofweek)
or similar for the other features (year, hour, minute, month…) but the problem with that is that you don’t get feature names, which is terrible for interpretation.
Featurizing datetimes is super common (the last ~10 datasets I worked on had it) and I think it’s a workflow we should make easier.
Describe your proposed solution
Implement a DateTimeTransformer that takes in maybe just a single column (that would work well with ColumnTransformer but is a bit different from other transformers, but quite similar to the CountVectorizer, so maybe if it takes a single column it should be DateTimeVectorizer
) and a list of features to derive, like dayofweek
, dayofyear
etc, but which creates meaningful feature names.
Describe alternatives you’ve considered, if relevant
An alternative would be to improve attaching feature names to FunctionTransformer
, but this would still require some non-trivial code for datetimes, and they are just very very common.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:11 (8 by maintainers)
Top GitHub Comments
I’d be happy to have an example to start with; I think it will be somewhat non-trivial code, even with #21569.
I agree, I think we should err on the side of making simple things simple here 😃
The question after having the example is still whether it’s in scope to include. If it was a small piece of code it’d be fine to have people copy & paste as a pattern, but I don’t think it’ll be like that.
While feature names with
FuncTransformer
certainly helps, the still open question is: Do we want to add such date time related transformers/estimators in scikit-learn? Options:DateTimeTransformer
FuncTransformer
in an exampleConcerning a possible
DateTimeTransformer
, for me, this is also a question about which input data we support.