question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: 1.2.5 -> 1.3.x breaks yaml.dump(DataFrameObject)

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

tmp = pd.DataFrame(
        {'col_name': [1,2,3,4]}
)

import yaml
dumped = yaml.dump(tmp)  # Fails starting with pandas==1.3.0
print(dumped)

Problem description

Another package, pytorch-lightning, has the option to save all given hyperparameters. They yaml all given parameters, which resulted in the kind of ugly output shown in the expected output for dataframes. Starting with pandas==1.3.0 this breaks because of the MWE shown above.

If this is expected behaviour from pandas I am sorry, then the issue should go to pytorch-lightning probably.

Expected Output

    !!python/object:pandas.core.frame.DataFrame
    _flags:
      allows_duplicate_labels: true
    _metadata: []
    _mgr: !!python/object/new:pandas.core.internals.managers.BlockManager
      state: !!python/tuple
      - &id004
        - !!python/object/apply:pandas.core.indexes.base._new_Index
          - &id002 !!python/name:pandas.core.indexes.base.Index ''
          - data: !!python/object/apply:numpy.core.multiarray._reconstruct
              args:
              - &id001 !!python/name:numpy.ndarray ''
              - !!python/tuple
                - 0
              - !!binary |
                Yg==
              state: !!python/tuple
              - 1
              - !!python/tuple
                - 1
              - &id003 !!python/object/apply:numpy.dtype
                args:
                - O8
                - false
                - true
                state: !!python/tuple
                - 3
                - '|'
                - null
                - null
                - null
                - -1
                - -1
                - 63
              - false
              - - col_name
            name: null
        - !!python/object/apply:pandas.core.indexes.base._new_Index
          - !!python/name:pandas.core.indexes.range.RangeIndex ''
          - name: null
            start: 0
            step: 1
            stop: 4
      - - &id005 !!python/object/apply:numpy.core.multiarray._reconstruct
          args:
          - *id001
          - !!python/tuple
            - 0
          - !!binary |
            Yg==
          state: !!python/tuple
          - 1
          - !!python/tuple
            - 1
            - 4
          - !!python/object/apply:numpy.dtype
            args:
            - i8
            - false
            - true
            state: !!python/tuple
            - 3
            - <
            - null
            - null
            - null
            - -1
            - -1
            - 0
          - false
          - !!binary |
            AQAAAAAAAAACAAAAAAAAAAMAAAAAAAAABAAAAAAAAAA=
      - - !!python/object/apply:pandas.core.indexes.base._new_Index
          - *id002
          - data: !!python/object/apply:numpy.core.multiarray._reconstruct
              args:
              - *id001
              - !!python/tuple
                - 0
              - !!binary |
                Yg==
              state: !!python/tuple
              - 1
              - !!python/tuple
                - 1
              - *id003
              - false
              - - col_name
            name: null
      - 0.14.1:
          axes: *id004
          blocks:
          - mgr_locs: !!python/object/apply:builtins.slice
            - 0
            - 1
            - 1
            values: *id005
    _typ: dataframe
    attrs: {}

Output of pd.show_versions()

pandas 1.2.5:

INSTALLED VERSIONS

commit : 7c48ff4409c622c582c56a5702373f726de08e96 python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252

pandas : 1.2.5 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.2 pip : 21.1.3 setuptools : 52.0.0.post20210125 Cython : None pytest : 6.2.4 hypothesis : None sphinx : 3.5.4 blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.51.2

pandas 1.3.1:

INSTALLED VERSIONS

commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa python : 3.8.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252

pandas : 1.3.1 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.2 pip : 21.1.3 setuptools : 52.0.0.post20210125 Cython : None pytest : 6.2.4 hypothesis : None sphinx : 3.5.4 blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.51.2

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
MarcoGorellicommented, Jul 27, 2021

Thanks @gunthergl for the report - I just left git bisect running and https://github.com/pandas-dev/pandas/pull/40842 was found to be the first bad commit (cc @jbrockmendel ). Haven’t looked into it further though

0reactions
guntherglcommented, Oct 21, 2021

Hi @jbrockmendel, pytorch-lightning uses yaml.UnsafeLoader - I read about the reason sometime somewhere but don’t know anymore exactly. However, the following should be appropriate:


import pandas as pd

tmp = pd.DataFrame(
                {'col_name': [1,2,3,4]}
                )

import yaml
dumped = yaml.dump(tmp)  # Fails starting with pandas==1.3.0
print(dumped)

loaded_UnsafeLoader = yaml.load(dumped, Loader=yaml.UnsafeLoader)
print(loaded_UnsafeLoader)
loaded_Loader = yaml.load(dumped, Loader=yaml.Loader)
print(loaded_Loader)

assert tmp.equals(loaded_UnsafeLoader)
assert tmp.equals(loaded_Loader)
Read more comments on GitHub >

github_iconTop Results From Across the Web

What's new in 1.4.0 (January 22, 2022) - Pandas
What's new in 1.4.0 (January 22, 2022)#. These are the changes in pandas 1.4.0. See Release notes for a full changelog including other...
Read more >
New PyYAML version breaks on most custom python objects
In PyYAML 4.x, dump is an alias for safe_dump , which won't handle arbitrary objects: >>> yaml.dump is yaml.safe_dump True.
Read more >
Bioconductor 3.16 Released
The package implements methods to fit regression mixture models for a probabilistic classification of cells, including multiplet detection.
Read more >
YAML: The Missing Battery in Python
You'll also serialize Python objects and create a YAML syntax highlighter. ... After all, the letter X in AJAX, a technique for getting...
Read more >
PyYAML Documentation
from yaml import load, dump try: from yaml import CLoader as Loader, ... The yaml.dump function accepts a Python object and produces a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found