Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[ENH] Calendar Feature Extractor

See original GitHub issue

Is your feature request related to a problem? Please describe. I would like to use sktime in the context of tree based models for time series. In general, these models need some data preparation to make them useful for time series. One data prepation step involves creating calendar dummy features representing the current day of the year, day of the month, week of the quarter etc. Another data preparation step would involve generating features representing fourier terms of different order and periodicity.

Describe the solution you’d like I have been thinking about the best way to implement this, and @aiwalter suggested to use an SeriesToSeries Transformer to generate new exogeneous features based on the index in ForecastingPipeline. Would that approach work from an architecture point of view, or do we need a separate not yet defined class “FeatureExtractor” and / or a specific IndexToSeries Generator?

We could either try to adapt solutions from GluonTS (dummies) / Prophet (fourier features) or build our own approach. Attached is some example code (not yet adapted to sktime) to outline the general goal.

import pandas as pd
import numpy as np

base_seasons = [
    ["parent","child","period","dummy"],
    ["year","year",None,"year"],
    ["year","quarter",365.25/4,"quarter"],  
    ["year","month",12,"month"],
    ["year","week",365.25/7,"week_of_year"],
    ["year","day",365.25,"day_of_year"],
    ["quarter","month",12/4,"month_of_quarter"],
    ["quarter","week",365.25/(4*7),"week_of_quarter"],
    ["quarter","day",365.25/4,"day_of_quarter"],
    ["month","week",365.25/(12*7),"week_of_month"],
    ["month","day",30,"day"],
    ["week","day",7,"day_of_week"],
    ["day","hour",24,"hour"],
    ["hour","minute",60,"minute"],
    ["minute","second",60,"second"],
    ["second","millisecond",1000,"millisecond"]
]


base_seasons = pd.DataFrame(base_seasons[1:],columns=base_seasons[0])

base_seasons["fourier"] = base_seasons["child"] + '_in_' + base_seasons["parent"]
base_seasons["child"] = base_seasons["child"].astype("category").cat.reorder_categories(['year','quarter','month', 'week','day',"hour","minute","second","millisecond"])
base_seasons["rank"] = base_seasons["child"].cat.codes



def get_supported_seasons(base_frequency,base_seasons=base_seasons):
    rank = base_seasons.loc[base_seasons["child"]==base_frequency,"rank"].max()
    matches = base_seasons.loc[base_seasons["rank"]<=rank]
    if matches.shape[0] == 0:
        raise ValueError("Seasonality or Frequency not supported")
    return matches

def calendar_fourier(dti_actual,fourier_period,fourier_order,base_frequency):
    dti = pd.date_range(dti_actual.min(),dti_actual.max(),freq=base_frequency.upper()[0]) 
    
    if dti.min().to_numpy() != dti_actual.min().to_numpy():
        raise ValueError("Actual time Series does not correspond to frequencies provided by pandas Datetimeindex. This can happen when e.g. monthly data does not correspond to month end.")
    
    funcs = [np.sin,np.cos] 

    outlist = list()
    for index,item in enumerate(fourier_order):
        outlist.append((np.arange(item)+1)*1/fourier_period[index])

    inlist = np.zeros(shape=(len(dti),np.concatenate(outlist).shape[0]*len(funcs)))
    colnames = list()
    k = 0
    for item in outlist:
        for index,p in enumerate(item):
            for func in funcs:
                inlist[:,k] =func(2*np.pi*p*np.arange(len(dti)))
#                colnames.append("per"+ str(int(1/p)*int(index+1)) + "_or" + str(int(index+1))+func.__name__)
                colnames.append("per"+ str(int(1/p)) + func.__name__)
                k = k +1

    inlist = pd.DataFrame(inlist)
    inlist.columns = colnames
    inlist.set_index(dti,inplace=True)

    inlist = inlist[inlist.index.isin(dti_actual)]
    inlist = inlist.reset_index(drop=True)
    return inlist

def calendar_dummies(x,funcs):
    if funcs == "week_of_year":
        return pd.DataFrame({funcs:getattr(x,"isocalendar")()["week"].reset_index(drop="date")})
    elif funcs == "week_of_month":
        return  pd.DataFrame({funcs:(x.day - 1) // 7 + 1})
    elif funcs == "month_of_quarter":
        return pd.DataFrame({funcs:(np.floor(x.month/4)+1).astype(np.int64)})
    elif funcs == "week_of_quarter":
        year_week = getattr(x,"isocalendar")()["week"]
        def week_of_quarter(x):
            if x <= 13:
                return 1
            elif x <=26:
                return x-13
            elif x <=39:
                return x-26
            elif x <=53:
                return x-39
        year_week.apply(lambda x: week_of_quarter(x))
    elif funcs == "millisecond":
        return  pd.DataFrame({funcs:x.microsecond*1000})
    elif funcs == "day_of_quarter":
        quarter = x.quarter
        quarter_start = pd.DatetimeIndex(
        x.year.map(str) + "-" + (3 * quarter - 2).map(int).map(str) + "-01")
        values  = ((x - quarter_start) / pd.to_timedelta("1D") + 1).astype(int)
        return pd.DataFrame({funcs:values},dtype=np.float64)
    else:
        return pd.DataFrame({funcs:getattr(x,funcs)})


def calendar_other(x,funcs):
    if funcs == "proportion_total":
        values = 1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min())
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_total_squared":
        values = (1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min()))^2
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_total_cubic":
        values = (1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min()))^3
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_total_squared_root":
        values = (1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min()))^(1/2)
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_total_cubic_root":
        values = (1+(x.view(np.int64)-x.view(np.int64).max())/(x.view(np.int64).max()-x.view(np.int64).min()))^(1/3)
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_month":
        values = x.day/x.days_in_month
        return pd.DataFrame({funcs:values},dtype=np.float64)
    elif funcs == "proportion_quarter":
        quarter = x.dt.quarter
        quarter_start = pd.DatetimeIndex(
        x.dt.year.map(str) + "-" + (3 * quarter - 2).map(int).map(str) + "-01")
        next_quarter_start = x + pd.tseries.offsets.QuarterBegin(startingMonth=1)
        quarter_length = (next_quarter_start - quarter_start).dt.days
        doq  = ((x - quarter_start) / pd.to_timedelta("1D") + 1).astype(int)
        values = doq / quarter_length
        return pd.DataFrame({funcs:values},dtype=np.float64)

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:10

Top GitHub Comments

1reaction

aiwaltercommented, Aug 13, 2021

I think also its a good idea to just ignore this edge case for now. And there is still a workaround for advanced users with the solution I mentioned:

forecaster.fit(y, X=CalendarTransformer().fit_transform(pd.DataFrame(index=y.index)))

1reaction

fkiralycommented, Aug 10, 2021

Yes, exactly, why not?

If it’s a transformer applied to X (not y), in predict it would be applied to X before predict is called, and the transformed X (with calendar indicator) would be fed to predict. So it seems all is fine?

Top Results From Across the Web

[ENH] Calendar Feature Extractor · Issue #1273 - GitHub

One data prepation step involves creating calendar dummy features representing the current day of the year, day of the month, week of the ......

Enhancement Suggestion - more detail in the email/ calendar ...

Extract from my Outlook Desktop (Pipedrive content in pale blue). (Possible) future enhancement. A little more detail in the sync data would ...

NetSuite Applications Suite - Using Your Calendar

Understanding Form Layout Enhancement Upgrade Logic · Upgrade Logic for Custom Forms · Upgrade Logic for Subtabs · Upgrade Logic for Fields (Diagram) ......

EE 263 - Digital Image Processing - Acalog ACMS™ - SJSU Catalog

Topics include image formation, modeling, transforms, enhancement, segmentation, representation, feature extraction, and object recognition.

Calendar - IN.gov

Upcoming Events · Dec. 10. Audubon Bird Walk, Mounds State Park · Dec. 10. Breakfast With Santa, Shakamak State Park · Dec. 10....