Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEATURE] Contribution: DateTime Periodicity Encoder

See original GitHub issue

I’ve implemented a DateTimePeriodicityEncoder. It is a scikit-learn encoder for datetime features that uses sine and cosine transformations to capture periodicity in datetimes. This type of transformation ensures that an algorithm can learn that 23 hours is close to 00 hours, minute 60 is close to minute 1, etc.

It can be used to capture different “aspects” of a datetime (e.g. minute-in-hour, hour-of-day, day-of-week, day-of-month) as such:

dpe1 = DateTimePeriodicityEncoder(aspects=["second", "minute", "hour", "weekday", "day", "month"])
dpe2 = DateTimePeriodicityEncoder(["hour", "weekday"])
dpe3 = DateTimePeriodicityEncoder("weekday")

For each of the aspects, it returns two new columns containing the respective sine and cosine transformations.

I have written unit tests and it passes the scikit-learn check_estimator (with some tags).

@MBrouns asked me to create and issue and tag you, @koaning, to see if this could be a useful contribution for scikit-lego. If so, I can submit a pull request.

Issue Analytics

State:
Created 3 years ago
Comments:17 (8 by maintainers)

Top GitHub Comments

1reaction

tbezemercommented, Oct 23, 2020

Yes, definitely! I chatted with Matthijs about this a week ago. It’s high up on my to do list. To be continued.

0reactions

tbezemercommented, Oct 29, 2020

Ad 1. Yes, indeed! You can pass a list as well. FeatureUnion could be an alternative but because these transformations all pertain to the same column of datetimes, so I think it makes more sense to extract them in one go instead of having to repeat it manually for each aspect. This way, you could also grid search with different combinations of aspects without changing your preprocessing pipeline. If you disagree, I can rewrite it to only allow a single aspect.

Ad 2. We can definitely change aspects to features!

Ad 3. Yeah, that is definitely another way of doing it. I thought that the sklearn convention was to only use trailing underscore variables in the transform step, to ensure that attributes have not changed since first fit, but I see how copying them into a differently named attribute seems a bit redundant. I can change that as per your suggestion!

Ad 4. I hope this image makes it clearer. Forgive my poor handwriting 😃 . @MBrouns suggested this to me to also allow for higher frequency effects within the total period of each aspect. That, or maybe I horribly misunderstood what he meant, haha.

Top Results From Across the Web

Three Approaches to Encoding Time Information as Features ...

Learn an easier way to encode time-related Information by using dummy variables, cyclical coding with sine/cosine information, ...

Encoding features like month and hour as categorial or ...

If I convert the variable as categorical, VarImp function shows importance value for each hour and it looks very disorganized. I am just ......

Cyclical features encoding, it's about time! |

Discover a simple technic to transform features such as time, weeks, months or seasons, and still preserve their cyclical significance.

API Reference - DataRobot Python package documentation

number of features in the dataset grouped by feature type. last_modification_date: string. the ISO 8601 formatted date and time when the dataset was...

Multivariate Time Series Forecasting with LSTMs in Keras

The data includes the date-time, the pollution called PM2.5 concentration, ... The complete feature list in the raw data is as follows:.