question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Top-level API for time grouping

See original GitHub issue

This package has a growing number of methods that rely on a DateTimeIndex inside the estimators. I’m beginning to think this design decision is complicating the implementation of individual methods (see @dgergel’s great and persistent attempts in #28 as an example) and that we should consider stepping back from the current approach. The use DateTimeIndexes inside the estimators is also the primary divergence from full scikit-learn integration. In this issue, I’ll propose a new top-level API that supports using time indexes for grouped training/prediction outside individual regressors/transformers.

I’ll start by outlining a few common time-grouping approaches:

  • Group-by day-of-year or month - These are the simplest to implement using out-of-the-box functionality in both Xarray and Pandas
  • Padded and/or windowed groups based on day-of-year or month - @jukent’s Zscore method and @dgergel’s NASA-NEX BCSD (#28) method use variations on padded day-of-year groupers. Additionally, the reference implementation of ARRM (see #42) also uses a +/- 15 day window around month boundaries.

My proposal is that we develop (or in some cases, continue to develop) a series of grouper classes that perform these (sometimes) esoteric grouping operations and that we utilize these grouper classes with an API object that supports training and prediction.

The grouper concept may look something like this:


index = ds.indexes['time']
# or
index = df.index

group_iter = PaddedDOYGrouper(index, window=15)

under the hood, these groupers would support iteration like this…

# pandas
for inds in group_iter:
    df_group = df.iloc[inds]

# xarray
for inds in group_iter:
    ds_group = ds.isel(time=inds)

We could then write a simple model API that combines the grouper object with a proper sklearn-compatible regressors/pipelines:


arrm_model = GroupedRegressor(estimator=PiecewiseLinearRegression, grouper=PaddedDOYGrouper)

arrm_model.fit(X_df, y_df)  # -> fits multiple PiecewiseLinearRegression for each group produced by the grouper
...

If done correctly, I think we can share the Groupers between Pandas and Xarray applications allowing us to use these either at the PointWiseDownscaler level or for individual points.


@jukent and @dgergel - I’m curious to hear from you on the potential feasibility of this approach for the methods you have developed here. Am I missing anything in that would keep us from executing on this sort of API?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jukentcommented, Oct 27, 2020

I also filled out the poll. Thanks for putting that together.

1reaction
jhammancommented, Oct 27, 2020

I’m out of commission this week but would be happy to chat about this next week. I filled out your poll. Thanks for setting it up.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Groups API - GitLab Docs
Introduced in GitLab 14.6. Transfer a group to a new parent group or turn a subgroup to a top-level group. Available to administrators...
Read more >
REST API summarizations, aggregations and groupings
The group= allows you to summarize by text, number, boolean and date fields. You can also provide a function to summarize by parts...
Read more >
VirtualMachineInstanceSpec - Top Level API Objects
v1. NodeAffinity. Node affinity is a group of node affinity scheduling rules. The scheduler will prefer to schedule pods to nodes that satisfy...
Read more >
Top-level APIs - Ibis Project
Create a cumulative window for use with window functions. All window frames / ranges are inclusive. Parameters¶. group_by Grouping key order_by ...
Read more >
Filtering and Sorting in the Tableau REST API
You can filter and sort the data that gets returned from requests to the following REST API endpoints: Get Users on Site [filter...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found