question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add DateTimeTransformer?

See original GitHub issue

Describe the workflow you want to enable

Do we want to add a transformer for pandas datetimes? We haven’t really added much based on pandas yet but this would be a pretty natural thing to add. You could argue that you can do something like FunctionTransformer(lambda X: X.dt.dayofweek) or similar for the other features (year, hour, minute, month…) but the problem with that is that you don’t get feature names, which is terrible for interpretation.

Featurizing datetimes is super common (the last ~10 datasets I worked on had it) and I think it’s a workflow we should make easier.

Describe your proposed solution

Implement a DateTimeTransformer that takes in maybe just a single column (that would work well with ColumnTransformer but is a bit different from other transformers, but quite similar to the CountVectorizer, so maybe if it takes a single column it should be DateTimeVectorizer) and a list of features to derive, like dayofweek, dayofyear etc, but which creates meaningful feature names.

Describe alternatives you’ve considered, if relevant

An alternative would be to improve attaching feature names to FunctionTransformer, but this would still require some non-trivial code for datetimes, and they are just very very common.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
amuellercommented, Dec 17, 2021

I’d be happy to have an example to start with; I think it will be somewhat non-trivial code, even with #21569.

But arguably this is not something that many people would think of doing by default.

I agree, I think we should err on the side of making simple things simple here 😃

The question after having the example is still whether it’s in scope to include. If it was a small piece of code it’d be fine to have people copy & paste as a pattern, but I don’t think it’ll be like that.

1reaction
lorentzenchrcommented, Dec 15, 2021

While feature names with FuncTransformer certainly helps, the still open question is: Do we want to add such date time related transformers/estimators in scikit-learn? Options:

  1. Include a dedicated DateTimeTransformer
  2. Show how it can be done with FuncTransformer in an example
  3. No action

Concerning a possible DateTimeTransformer, for me, this is also a question about which input data we support.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Custom transformer that splits dates into new column
Convert string type columns to datetime while taking the date format as a parameter · Append the original column names when spitting out...
Read more >
Tutorial: Working with Date and Time Attributes
Dates and times are transformed in FME using a trio of powerful transformers. These transformers are: DateTimeConverter: Converts a set of input ...
Read more >
include transformer for datetime variables · Issue #67 - GitHub
The transformer accepts a datetime variable from dataframe and will create additional columns for days of week , month etc. And if the...
Read more >
Date and time functions - IBM
Use this function to add a new column containing the timestamp to the data output by the Transformer stage: CurrentTimestamp().
Read more >
How to Use Data Transformers (Symfony Docs)
Internally, a data transformer converts the DateTime value of the field to a ... use Symfony\Component\Form\Extension\Core\Type\TextType; $builder->add( ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found