question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: pd.DataFrame.transform recursively loops in some cases

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

pd.DataFrame({"a":[None]}).transform({"a":int})

Problem description

Executing the above causes recursion depth limit exception. This is confusing and it is harder to pinpoint/debug than the expected exception.

Expected Output

Something akin to the output of int(None)

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Darwin OS-release : 18.6.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8

pandas : 1.0.3 numpy : 1.18.4 pytz : 2020.1 dateutil : 2.8.1 pip : 19.2.3 setuptools : 41.2.0 Cython : None pytest : 3.10.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 3.10.1 pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
pedrooacommented, May 21, 2020

I’ve looked into the try / except you mentioned(within aggregate of frame.py), and the recurssion occurs because the except is not doing anything with the error:

try:
    result, how = self._aggregate(func, axis=axis, *args, **kwargs)
except TypeError:
    pass

I’ve managed to fix it raising an error printing what went wrong, as in the following code:

try:
    result, how = self._aggregate(func, axis=axis, *args, **kwargs)
except TypeError as err:
    exc = TypeError(
                    "DataFrame constructor called with "
                    f"incompatible data and dtype: {err}"
                )
    raise exc from err

I am now just making sure this fix doesn’t break anything else.

0reactions
dsaxtoncommented, May 19, 2020

@pedrooa Looks like the problem is the the try / except within aggregate of frame.py, would be interesting to look into why that’s needed and if there’s a better way to handle things

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to unnest (explode) a column in a pandas DataFrame ...
I know object dtype columns makes the data hard to convert with pandas functions. ... Given a dataframe with an empty list or...
Read more >
How can I deal with large data for this complicated scenario ...
Coding example for the question How can I deal with large data for this complicated scenario ? Recursive CTE and Pandas are not...
Read more >
What's new in 1.5.0 (September 19, 2022) - Pandas
These are bug fixes that might have notable behavior changes. Using dropna=True with groupby transforms#. A transform is an operation whose result ...
Read more >
Transforms and pipelines - Palantir
The DataFrame objects returned by a TransformInput are regular PySpark ... If your data transformation is going to be exclusively using the Pandas...
Read more >
Python | Pandas dataframe.add() - GeeksforGeeks
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found