question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Grouper: Origin param has no effect

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame([{"A": A, "datadate": datadate} 
                         for A in range(1,3) 
                         for datadate in pd.date_range(start='1/2/2022', end='2/1/2022', freq='D')])

ddg = df.groupby(["A", pd.Grouper(key="datadate", freq='W', origin='start')])

for i, dfg in ddg:
    print(dfg[["A", "datadate"]])
    print("-----------------------")


ddg = df.groupby(["A", pd.Grouper(key="datadate", freq='W', origin='1/5/2022')])

for i, dfg in ddg:
    print(dfg[["A", "datadate"]])
    print("-----------------------")

Issue Description

Whatever I set on the origin parameter of pd.Grouper, whether it is startor a date, it groups by starting on a Monday. If I remove the origin parameter, I get the same results.

Expected Behavior

The data are grouped weekly starting by the fixed timestamp provided by the origin parameter.

Installed Versions

INSTALLED VERSIONS

commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.8.13.final.0 python-bits : 64 OS : Darwin OS-release : 21.5.0 Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8

pandas : 1.4.3 numpy : 1.21.6 pytz : 2022.1 dateutil : 2.8.2 setuptools : 61.2.0 pip : 21.2.4 Cython : 0.29.30 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.0 html5lib : None pymysql : 1.0.2 psycopg2 : None jinja2 : 3.1.1 IPython : 8.2.0 pandas_datareader: None bs4 : 4.9.1 bottleneck : None brotli : None fastparquet : None fsspec : 2022.5.0 gcsfs : None markupsafe : 2.1.1 matplotlib : 3.5.2 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 8.0.0 pyreadstat : None pyxlsb : None s3fs : 2022.5.0 scipy : 1.8.0 snappy : None sqlalchemy : 1.4.32 tables : None tabulate : 0.8.10 xarray : None xlrd : None xlwt : None zstandard : None

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
houseofaicommented, Aug 2, 2022

@hamedgibago Yes, thanks

0reactions
houseofaicommented, Sep 5, 2022

@DBCerigo Thanks for pointing out, indeed W-TUE doesn’t raise an error.

ddg = df.groupby(["A", pd.Grouper(key="datadate", freq='W-TUE')])

for i, dfg in ddg:
    print(dfg[["A", "datadate"]])
    print("-----------------------")

Gives:

   A   datadate
0  1 2022-01-02
1  1 2022-01-03
2  1 2022-01-04
-----------------------
   A   datadate
3  1 2022-01-05
4  1 2022-01-06
5  1 2022-01-07
6  1 2022-01-08
7  1 2022-01-09
8  1 2022-01-10
9  1 2022-01-11
-----------------------

Note that the first date,2022-01-05, is a Wednesday.

The only way to start the week from the first date is to find the day of the first date (here 2022-01-02 - Sunday), and to set it to the day before:

ddg = df.groupby(["A", pd.Grouper(key="datadate", freq='W-SAT')])
 A   datadate
0  1 2022-01-02
1  1 2022-01-03
2  1 2022-01-04
3  1 2022-01-05
4  1 2022-01-06
5  1 2022-01-07
6  1 2022-01-08
-----------------------
    A   datadate
7   1 2022-01-09
8   1 2022-01-10
9   1 2022-01-11
...

Could you please confirm that is the expected behavior with the frequency parameter?

And I insist but the origin parameter doesn’t have any effect.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Control last row label when grouping a dataframe using ...
Secondly, I tried offsetting the groups without impacting the origin date, using parameters loffset=f"{n_rows % 30}D" and/or base=+/- n_rows % ...
Read more >
Grouper Web Services FAQ - Confluence Mobile - Internet2 Wiki
No, you should use the version of Grouper bundled in the web services. This makes it a little complicated if you have a...
Read more >
Age‐based life history of three groupers in the southern Gulf of ...
For groupers, as with other fish species, certain life parameters can be affected by pressure from overfishing, leading to effects such as ...
Read more >
Linters | golangci-lint
Name Description Presets Since asasalint ⚙️ check for pass any as any in variadic func(...any) bugs 1.47.0 bidichk ⚙️ Checks for dangerous unicode character sequences...
Read more >
Grouper source levels and aggregation dynamics inferred ...
Yellowfin Grouper had the lowest mean peak-to-peak (PP) call source level, ... These grouper overlap in habitat and in some locations, such as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found