Error for Int64Dtype column: "The 'reduce' method is not supported."
See original GitHub issueThank you for creating this very useful library!
Describe the bug
I experience an error when using Pandas Profiling with a data frame containing a Int64Dtype()
column with at least 5 rows.
To Reproduce
Create a file example.py
with this code:
"""
Test for issue 502:
https://github.com/pandas-profiling/pandas-profiling/issues/502
"""
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
df = pd.DataFrame({
'a': [1, 2, 3, 4, 5]
}, dtype=pd.Int64Dtype())
profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)
profile.to_file("your_report.html")
Run from the command line with python example.py
. Output:
Summarize dataset: 0%| | 0/15 [00:00<?, ?it/s]
Traceback (most recent call last):
File "example.py", line 10, in <module>
profile.to_file("your_report.html")
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 245, in to_file
data = self.to_html()
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 348, in to_html
return self.html
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 168, in html
self._html = self._render_html()
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 275, in _render_html
report = self.report
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 162, in report
self._report = get_report_structure(self.description_set)
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 143, in description_set
self._description_set = describe_df(self.title, self.df)
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/model/describe.py", line 63, in describe
series_description = get_series_descriptions(df, pbar)
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/model/summary.py", line 473, in get_series_descriptions
executor.imap_unordered(multiprocess_1d, args)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/model/summary.py", line 450, in multiprocess_1d
return column, describe_1d(series)
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/model/summary.py", line 419, in describe_1d
type_to_func[series_description["type"]](series, series_description)
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas_profiling/model/summary.py", line 151, in describe_numeric_1d
"min": np.min(present_values),
File "<__array_function__ internals>", line 6, in amin
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2831, in amin
keepdims=keepdims, initial=initial, where=where)
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
File "/home/ac/projects/python/profiling/env/lib/python3.6/site-packages/pandas/core/arrays/integer.py", line 372, in __array_ufunc__
raise NotImplementedError("The 'reduce' method is not supported.")
NotImplementedError: The 'reduce' method is not supported.
Version information:
- Python version: Python 3.6.9 (default, Apr 18 2020, 01:56:04)
- Environment: Where do you run the code? Command line
pip
: If you are usingpip
, runpip freeze
in your environment and report the results. The list of packages can be rather long, you can use the snippet below to collapse the output.
Click to expand Version information
astropy==4.0.1.post1
attrs==19.3.0
backcall==0.2.0
bleach==3.1.5
certifi==2020.6.20
chardet==3.0.4
confuse==1.1.0
cycler==0.10.0
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
htmlmin==0.1.12
idna==2.9
ImageHash==4.1.0
importlib-metadata==1.6.1
ipykernel==5.3.0
ipython==7.15.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.17.1
Jinja2==2.11.2
joblib==0.15.1
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
kiwisolver==1.2.0
llvmlite==0.33.0
MarkupSafe==1.1.1
matplotlib==3.2.2
missingno==0.4.2
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.0.7
networkx==2.4
notebook==6.0.3
numba==0.50.0
numpy==1.19.0
packaging==20.4
pandas==1.0.5
pandas-profiling==2.8.0
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
phik==0.10.0
pickleshare==0.7.5
Pillow==7.1.2
pkg-resources==0.0.0
prometheus-client==0.8.0
prompt-toolkit==3.0.5
ptyprocess==0.6.0
Pygments==2.6.1
pyparsing==2.4.7
pyrsistent==0.16.0
python-dateutil==2.8.1
pytz==2020.1
PyWavelets==1.1.1
PyYAML==5.3.1
pyzmq==19.0.1
requests==2.24.0
scipy==1.5.0
seaborn==0.10.1
Send2Trash==1.5.0
six==1.15.0
tangled-up-in-unicode==0.0.6
terminado==0.8.3
testpath==0.4.4
tornado==6.0.4
tqdm==4.46.1
traitlets==4.3.3
urllib3==1.25.9
visions==0.4.4
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1
zipp==3.1.0
Additional context
Having at least 5 rows seems to be required for the error to occur. This code will run WITHOUT ERROR:
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
df = pd.DataFrame({
'a': [1, 2, 3, 4] # no 5
}, dtype=pd.Int64Dtype())
profile = ProfileReport(df, title='Pandas Profiling Report', explorative=True)
profile.to_file("your_report.html")
Thank you!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:5
Top Results From Across the Web
Error for Int64Dtype column: "The 'reduce' method is not ...
I experience an error when using Pandas Profiling with a data frame containing a Int64Dtype() column with at least 5 rows. To Reproduce....
Read more >Downcasting columns with nullable integers in pandas ...
How can one downcast columns with nullable integers in pandas DataFrames? ... NotImplementedError: The 'reduce' method is not supported.
Read more >What's new in 2.0.0 (??) - Pandas
Improved error message for merge_asof() when join-columns were duplicated (GH50102) ... Construction with datetime64 or timedelta64 dtype with unsupported ...
Read more >Changing Data Type in Pandas - Ritchie Ng
Method 1: Change datatype after reading the csv ... have been following and copy pasting the code but I am not sure why...
Read more >ray.air.util.tensor_extensions.pandas — Ray 3.0.0.dev0
__iter__() # - Added support for column casts to extension types. ... 2, 2, 2), dtype=int64) dtype: object >>> # Pandas is now...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I installed it in via pip using that command and tried running the code from the bug report. It ran without error and generated the HTML output as expected. I believe the issue is solved now.
Thank you very much!
Seems that this error was introduced by the changes for the enhanced performance of summarization of numeric series. I’ve pushed a workaround to revert this for pandas’ nullable integers. Will be in the next release.