BUG: Grouper: Origin param has no effect
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame([{"A": A, "datadate": datadate}
for A in range(1,3)
for datadate in pd.date_range(start='1/2/2022', end='2/1/2022', freq='D')])
ddg = df.groupby(["A", pd.Grouper(key="datadate", freq='W', origin='start')])
for i, dfg in ddg:
print(dfg[["A", "datadate"]])
print("-----------------------")
ddg = df.groupby(["A", pd.Grouper(key="datadate", freq='W', origin='1/5/2022')])
for i, dfg in ddg:
print(dfg[["A", "datadate"]])
print("-----------------------")
Issue Description
Whatever I set on the origin
parameter of pd.Grouper
, whether it is start
or a date, it groups by starting on a Monday.
If I remove the origin
parameter, I get the same results.
Expected Behavior
The data are grouped weekly starting by the fixed timestamp provided by the origin parameter.
Installed Versions
INSTALLED VERSIONS
commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.8.13.final.0 python-bits : 64 OS : Darwin OS-release : 21.5.0 Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8
pandas : 1.4.3 numpy : 1.21.6 pytz : 2022.1 dateutil : 2.8.2 setuptools : 61.2.0 pip : 21.2.4 Cython : 0.29.30 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.0 html5lib : None pymysql : 1.0.2 psycopg2 : None jinja2 : 3.1.1 IPython : 8.2.0 pandas_datareader: None bs4 : 4.9.1 bottleneck : None brotli : None fastparquet : None fsspec : 2022.5.0 gcsfs : None markupsafe : 2.1.1 matplotlib : 3.5.2 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 8.0.0 pyreadstat : None pyxlsb : None s3fs : 2022.5.0 scipy : 1.8.0 snappy : None sqlalchemy : 1.4.32 tables : None tabulate : 0.8.10 xarray : None xlrd : None xlwt : None zstandard : None
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:9 (1 by maintainers)
Top GitHub Comments
@hamedgibago Yes, thanks
@DBCerigo Thanks for pointing out, indeed
W-TUE
doesn’t raise an error.Gives:
Note that the first date,
2022-01-05
, is a Wednesday.The only way to start the week from the first date is to find the day of the first date (here
2022-01-02
- Sunday), and to set it to the day before:Could you please confirm that is the expected behavior with the frequency parameter?
And I insist but the
origin
parameter doesn’t have any effect.