question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[ENH] Calendar Feature Extractor

See original GitHub issue

Is your feature request related to a problem? Please describe. I would like to use sktime in the context of tree based models for time series. In general, these models need some data preparation to make them useful for time series. One data prepation step involves creating calendar dummy features representing the current day of the year, day of the month, week of the quarter etc. Another data preparation step would involve generating features representing fourier terms of different order and periodicity.

Describe the solution you’d like I have been thinking about the best way to implement this, and @aiwalter suggested to use an SeriesToSeries Transformer to generate new exogeneous features based on the index in ForecastingPipeline. Would that approach work from an architecture point of view, or do we need a separate not yet defined class “FeatureExtractor” and / or a specific IndexToSeries Generator?

We could either try to adapt solutions from GluonTS (dummies) / Prophet (fourier features) or build our own approach. Attached is some example code (not yet adapted to sktime) to outline the general goal.

import pandas as pd
import numpy as np

base_seasons = [
    ["parent","child","period","dummy"],
    ["year","year",None,"year"],
    ["year","quarter",365.25/4,"quarter"],  
    ["year","month",12,"month"],
    ["year","week",365.25/7,"week_of_year"],
    ["year","day",365.25,"day_of_year"],
    ["quarter","month",12/4,"month_of_quarter"],
    ["quarter","week",365.25/(4*7),"week_of_quarter"],
    ["quarter","day",365.25/4,"day_of_quarter"],
    ["month","week",365.25/(12*7),"week_of_month"],
    ["month","day",30,"day"],
    ["week","day",7,"day_of_week"],
    ["day","hour",24,"hour"],
    ["hour","minute",60,"minute"],
    ["minute","second",60,"second"],
    ["second","millisecond",1000,"millisecond"]
]


base_seasons = pd.DataFrame(base_seasons[1:],columns=base_seasons[0])

base_seasons["fourier"] = base_seasons["child"] + '_in_' + base_seasons["parent"]
base_seasons["child"] = base_seasons["child"].astype("category").cat.reorder_categories(['year','quarter','month', 'week','day',"hour","minute","second","millisecond"])
base_seasons["rank"] = base_seasons["child"].cat.codes



def get_supported_seasons(base_frequency,base_seasons=base_seasons):
    rank = base_seasons.loc[base_seasons["child"]==base_frequency,"rank"].max()
    matches = base_seasons.loc[base_seasons["rank"]<=rank]
    if matches.shape[0] == 0:
        raise ValueError("Seasonality or Frequency not supported")
    return matches

def calendar_fourier(dti_actual,fourier_period,fourier_order,base_frequency):
    dti = pd.date_range(dti_actual.min(),dti_actual.max(),freq=base_frequency.upper()[0]) 
    
    if dti.min().to_numpy() != dti_actual.min().to_numpy():
        raise ValueError("Actual time Series does not correspond to frequencies provided by pandas Datetimeindex. This can happen when e.g. monthly data does not correspond to month end.")
    
    funcs = [np.sin,np.cos] 

    outlist = list()
    for index,item in enumerate(fourier_order):
        outlist.append((np.arange(item)+1)*1/fourier_period[index])

    inlist = np.zeros(shape=(len(dti),np.concatenate(outlist).shape[0]*len(funcs)))
    colnames = list()
    k = 0
    for item in outlist:
        for index,p in enumerate(item):
            for func in funcs:
                inlist[:,k] =func(2*np.pi*p*np.arange(len(dti)))
#                colnames.append("per"+ str(int(1/p)*int(index+1)) + "_or" + str(int(index+1))+func.__name__)
                colnames.append("per"+ str(int(1/p)) + func.__name__)
                k = k +1

    inlist = pd.DataFrame(inlist)
    inlist.columns = colnames
    inlist.set_index(dti,inplace=True)

    inlist = inlist[inlist.index.isin(dti_actual)]
    inlist = inlist.reset_index(drop=True)
    return inlist

def calendar_dummies(x,funcs):
    if funcs == "week_of_year":
        return pd.DataFrame({funcs:getattr(x,"isocalendar")()["week"].reset_index(drop="date")})
    elif funcs == "week_of_month":
        return  pd.DataFrame({funcs:(x.day - 1) // 7 + 1})
    elif funcs == "month_of_quarter":
        return pd.DataFrame({funcs:(np.floor(x.month/4)+1).astype(np.int64)})
    elif funcs == "week_of_quarter":
        year_week = getattr(x,"isocalendar")()["week"]
        def week_of_quarter(x):
            if x <= 13:
                return 1
            elif x <=26:
                return x-13
            elif x <=39:
                return x-26
            elif x <=53:
                return x-39
        year_week.apply(lambda x: week_of_quarter(x))
    elif funcs == "millisecond":
        return  pd.DataFrame({funcs:x.microsecond*1000})
    elif funcs == "day_of_quarter":
        quarter = x.quarter
        quarter_start = pd.DatetimeIndex(
        x.year.map(str) + "-" + (3 * quarter - 2).map(int).map(str) + "-01")
        values  = ((x - quarter_start) / pd.to_timedelta("1D") + 1).astype(int)
        return pd.DataFrame({funcs:values},dtype=np.float64)
    else:
        return pd.DataFrame({funcs:getattr(x,funcs)})


def calendar_other(x,funcs):
    if funcs == "proportion_total":
        values = 1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min())
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_total_squared":
        values = (1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min()))^2
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_total_cubic":
        values = (1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min()))^3
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_total_squared_root":
        values = (1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min()))^(1/2)
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_total_cubic_root":
        values = (1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min()))^(1/3)
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_month":
        values = x.day/x.days_in_month
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_quarter":
        quarter = x.dt.quarter
        quarter_start = pd.DatetimeIndex(
        x.dt.year.map(str) + "-" + (3 * quarter - 2).map(int).map(str) + "-01")
        next_quarter_start = x + pd.tseries.offsets.QuarterBegin(startingMonth=1)
        quarter_length = (next_quarter_start - quarter_start).dt.days
        doq  = ((x - quarter_start) / pd.to_timedelta("1D") + 1).astype(int)
        values = doq / quarter_length
        return pd.DataFrame({funcs:values},dtype=np.float64)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:10

github_iconTop GitHub Comments

1reaction
aiwaltercommented, Aug 13, 2021

I think also its a good idea to just ignore this edge case for now. And there is still a workaround for advanced users with the solution I mentioned:

forecaster.fit(y, X=CalendarTransformer().fit_transform(pd.DataFrame(index=y.index)))

1reaction
fkiralycommented, Aug 10, 2021

Yes, exactly, why not?

If it’s a transformer applied to X (not y), in predict it would be applied to X before predict is called, and the transformed X (with calendar indicator) would be fed to predict. So it seems all is fine?

Read more comments on GitHub >

github_iconTop Results From Across the Web

[ENH] Calendar Feature Extractor · Issue #1273 - GitHub
One data prepation step involves creating calendar dummy features representing the current day of the year, day of the month, week of the ......
Read more >
Enhancement Suggestion - more detail in the email/ calendar ...
Extract from my Outlook Desktop (Pipedrive content in pale blue). (Possible) future enhancement. A little more detail in the sync data would ...
Read more >
NetSuite Applications Suite - Using Your Calendar
Understanding Form Layout Enhancement Upgrade Logic · Upgrade Logic for Custom Forms · Upgrade Logic for Subtabs · Upgrade Logic for Fields (Diagram) ......
Read more >
EE 263 - Digital Image Processing - Acalog ACMS™ - SJSU Catalog
Topics include image formation, modeling, transforms, enhancement, segmentation, representation, feature extraction, and object recognition.
Read more >
Calendar - IN.gov
Upcoming Events · Dec. 10. Audubon Bird Walk, Mounds State Park · Dec. 10. Breakfast With Santa, Shakamak State Park · Dec. 10....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found