[FEATURE] Contribution: DateTime Periodicity Encoder
See original GitHub issueI’ve implemented a DateTimePeriodicityEncoder
. It is a scikit-learn encoder for datetime features that uses sine and cosine transformations to capture periodicity in datetimes. This type of transformation ensures that an algorithm can learn that 23 hours is close to 00 hours, minute 60 is close to minute 1, etc.
It can be used to capture different “aspects” of a datetime (e.g. minute-in-hour, hour-of-day, day-of-week, day-of-month) as such:
dpe1 = DateTimePeriodicityEncoder(aspects=["second", "minute", "hour", "weekday", "day", "month"])
dpe2 = DateTimePeriodicityEncoder(["hour", "weekday"])
dpe3 = DateTimePeriodicityEncoder("weekday")
For each of the aspects, it returns two new columns containing the respective sine and cosine transformations.
I have written unit tests and it passes the scikit-learn check_estimator
(with some tags).
@MBrouns asked me to create and issue and tag you, @koaning, to see if this could be a useful contribution for scikit-lego
. If so, I can submit a pull request.
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (8 by maintainers)
Top GitHub Comments
Yes, definitely! I chatted with Matthijs about this a week ago. It’s high up on my to do list. To be continued.
Ad 1. Yes, indeed! You can pass a list as well.
FeatureUnion
could be an alternative but because these transformations all pertain to the same column of datetimes, so I think it makes more sense to extract them in one go instead of having to repeat it manually for each aspect. This way, you could also grid search with different combinations of aspects without changing your preprocessing pipeline. If you disagree, I can rewrite it to only allow a single aspect.Ad 2. We can definitely change
aspects
tofeatures
!Ad 3. Yeah, that is definitely another way of doing it. I thought that the sklearn convention was to only use trailing underscore variables in the transform step, to ensure that attributes have not changed since first fit, but I see how copying them into a differently named attribute seems a bit redundant. I can change that as per your suggestion!
Ad 4. I hope this image makes it clearer. Forgive my poor handwriting 😃 . @MBrouns suggested this to me to also allow for higher frequency effects within the total period of each aspect. That, or maybe I horribly misunderstood what he meant, haha.