question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Decimal fields dropped in group by with more than one column

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
import decimal

df = pd.DataFrame({'a': [decimal.Decimal('4.56')]*6, 'b': range(3, 6)*2, 'c': range(6)})
print df.groupby('b')['a'].sum()
print df.groupby('b')['a', 'c'].sum()
print df.groupby('b').agg({'a': 'sum', 'c': 'sum'})

Output:

b
3    9.12
4    9.12
5    9.12
Name: a, dtype: object
   c
b
3  3
4  5
5  7
      a  c
b
3  9.12  3
4  9.12  5
5  9.12  7

Problem description

The aggregation over column a is dropped when another field is accessed from the groupby object, but works when requested through agg (I’m not sure if ‘sum’ is exactly equivalent to .sum() in the above).

Expected Output

b
3    9.12
4    9.12
5    9.12
Name: a, dtype: object
      a  c
b
3  9.12  3
4  9.12  5
5  9.12  7
      a  c
b
3  9.12  3
4  9.12  5
5  9.12  7

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.15.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.23.4 pytest: None pip: 18.0 setuptools: 39.1.0 Cython: None numpy: 1.14.3 scipy: 1.1.0 pyarrow: 0.8.0 xarray: None IPython: 5.7.0 sphinx: None patsy: 0.5.0 dateutil: 2.7.2 pytz: 2018.4 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: 1.1.0 xlwt: None xlsxwriter: 1.0.4 lxml: None bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: 2.7.4 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
jschendelcommented, Aug 12, 2018

There is no native support for Decimal types in NumPy and therefore it is also generally weakly supported in pandas. In theory a community-contributed Extension would help improve that.

As a side note, we actually have this within our test suite, albeit in a fledgling state:

https://github.com/pandas-dev/pandas/blob/0370740034978d3a63d4b8e5e2c96ff54e7e08ba/pandas/tests/extension/decimal/array.py#L37-L118

This doesn’t quite get the job done though, as there’s currently no way to dynamically alter ExtensionBlock.is_numeric, which currently is always False, and is what get_numeric_data is ultimately looking at for DecimalArray.

Will open a separate issue for the above though, as it’s only tangentially related to this issue, and using DecimalArray would really only resolve this issue for the Decimal case (i.e. a more generic solution for object dtype might be nice?).

1reaction
druddcommented, Aug 10, 2018

Another tactic would be to remove the get_numeric_data call and rely on the try/except later in the loop which should get raised if there is no appropriate aggregation function.

I have tested this and it restores consistency between the two methods. Happy to PR if we can be confident this won’t break other logic.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How can I sum two different columns at once where one ...
If I drop the price column and run .sum(level=0) it takes a long time. Look at the times of these two different methods(the...
Read more >
Group by: split-apply-combine — pandas 1.5.2 documentation
By “group by” we are referring to a process involving one or more of the following steps: Splitting the data into groups based...
Read more >
How to drop one or multiple columns in Pandas Dataframe
Method 3: Drop Columns from a Dataframe using ix() and drop() method. Remove all columns between a specific column name to another column's...
Read more >
Decimal value is rounded up/down to integer value - Office
When you enter a decimal value in a column in Microsoft Access, the decimal value is rounded up or down to an integer...
Read more >
How to display numbers with two decimal places
So Modeling tab: Format drop down will let you pick formats for dates and such. But when you pick a decimal type, you...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found