Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: read_parquet no longer supports file-like objects

See original GitHub issue

Code Sample, a copy-pastable example

from io import BytesIO
import pandas as pd

buffer = BytesIO()

df = pd.DataFrame([1,2,3], columns=["a"])
df.to_parquet(buffer)

df2 = pd.read_parquet(buffer)

Problem description

The current behavior of read_parquet(buffer) is that it raises the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./working_dir/tvenv/lib/python3.7/site-packages/pandas/io/parquet.py", line 315, in read_parquet
    return impl.read(path, columns=columns, **kwargs)
  File "./working_dir/tvenv/lib/python3.7/site-packages/pandas/io/parquet.py", line 131, in read
    path, filesystem=get_fs_for_path(path), **kwargs
  File "./working_dir/tvenv/lib/python3.7/site-packages/pyarrow/parquet.py", line 1162, in __init__
    self.paths = _parse_uri(path_or_paths)
  File "./working_dir/tvenv/lib/python3.7/site-packages/pyarrow/parquet.py", line 47, in _parse_uri
    path = _stringify_path(path)
  File "./working_dir/tvenv/lib/python3.7/site-packages/pyarrow/util.py", line 67, in _stringify_path
    raise TypeError("not a path-like object")
TypeError: not a path-like object

Expected Output

Instead, read_parquet(buffer) should return a new DataFrame with the same contents as the serialized DataFrame stored in buffer

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-99-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.4 numpy : 1.18.4 pytz : 2020.1 dateutil : 2.8.1 pip : 9.0.1 setuptools : 39.0.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : 0.999999999 pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.17.1 pytables : None pytest : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

State:
Created 3 years ago
Reactions:8
Comments:26 (14 by maintainers)

Top GitHub Comments

3reactions

austospumantocommented, May 30, 2020

@claytonlemons I am encountering the same issue.

If I downgrade from 1.0.4 --> 1.0.3 (while keeping the pyarrow version the same), I can again read from BytesIO buffers without issue. Since upgrading the pandas version from 1.0.3 --> 1.0.4 seems both necessary and sufficient to cause the file-like object reading issues, it seems like it may indeed be correct to consider this as an issue with pandas, not pyarrow.

@jreback Would you consider reopening this issue?

2reactions

alimcmaster1commented, Jun 8, 2020