BUG: 1.2.5 -> 1.3.x breaks yaml.dump(DataFrameObject)
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
tmp = pd.DataFrame(
{'col_name': [1,2,3,4]}
)
import yaml
dumped = yaml.dump(tmp) # Fails starting with pandas==1.3.0
print(dumped)
Problem description
Another package, pytorch-lightning, has the option to save all given hyperparameters. They yaml all given parameters, which resulted in the kind of ugly output shown in the expected output for dataframes. Starting with pandas==1.3.0
this breaks because of the MWE shown above.
If this is expected behaviour from pandas I am sorry, then the issue should go to pytorch-lightning probably.
Expected Output
!!python/object:pandas.core.frame.DataFrame
_flags:
allows_duplicate_labels: true
_metadata: []
_mgr: !!python/object/new:pandas.core.internals.managers.BlockManager
state: !!python/tuple
- &id004
- !!python/object/apply:pandas.core.indexes.base._new_Index
- &id002 !!python/name:pandas.core.indexes.base.Index ''
- data: !!python/object/apply:numpy.core.multiarray._reconstruct
args:
- &id001 !!python/name:numpy.ndarray ''
- !!python/tuple
- 0
- !!binary |
Yg==
state: !!python/tuple
- 1
- !!python/tuple
- 1
- &id003 !!python/object/apply:numpy.dtype
args:
- O8
- false
- true
state: !!python/tuple
- 3
- '|'
- null
- null
- null
- -1
- -1
- 63
- false
- - col_name
name: null
- !!python/object/apply:pandas.core.indexes.base._new_Index
- !!python/name:pandas.core.indexes.range.RangeIndex ''
- name: null
start: 0
step: 1
stop: 4
- - &id005 !!python/object/apply:numpy.core.multiarray._reconstruct
args:
- *id001
- !!python/tuple
- 0
- !!binary |
Yg==
state: !!python/tuple
- 1
- !!python/tuple
- 1
- 4
- !!python/object/apply:numpy.dtype
args:
- i8
- false
- true
state: !!python/tuple
- 3
- <
- null
- null
- null
- -1
- -1
- 0
- false
- !!binary |
AQAAAAAAAAACAAAAAAAAAAMAAAAAAAAABAAAAAAAAAA=
- - !!python/object/apply:pandas.core.indexes.base._new_Index
- *id002
- data: !!python/object/apply:numpy.core.multiarray._reconstruct
args:
- *id001
- !!python/tuple
- 0
- !!binary |
Yg==
state: !!python/tuple
- 1
- !!python/tuple
- 1
- *id003
- false
- - col_name
name: null
- 0.14.1:
axes: *id004
blocks:
- mgr_locs: !!python/object/apply:builtins.slice
- 0
- 1
- 1
values: *id005
_typ: dataframe
attrs: {}
Output of pd.show_versions()
pandas 1.2.5
:
INSTALLED VERSIONS
commit : 7c48ff4409c622c582c56a5702373f726de08e96 python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252
pandas : 1.2.5 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.2 pip : 21.1.3 setuptools : 52.0.0.post20210125 Cython : None pytest : 6.2.4 hypothesis : None sphinx : 3.5.4 blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.51.2
pandas 1.3.1
:
INSTALLED VERSIONS
commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252
pandas : 1.3.1 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.2 pip : 21.1.3 setuptools : 52.0.0.post20210125 Cython : None pytest : 6.2.4 hypothesis : None sphinx : 3.5.4 blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.51.2
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (10 by maintainers)
Thanks @gunthergl for the report - I just left
git bisect
running and https://github.com/pandas-dev/pandas/pull/40842 was found to be the first bad commit (cc @jbrockmendel ). Haven’t looked into it further thoughHi @jbrockmendel,
pytorch-lightning
usesyaml.UnsafeLoader
- I read about the reason sometime somewhere but don’t know anymore exactly. However, the following should be appropriate: