Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pandas 1.0.1 read_csv() is broken for some file-like objects

See original GitHub issue

Code Sample

import os
import pandas
import tempfile
import traceback

# pandas.show_versions()

fname = ''
with tempfile.NamedTemporaryFile(delete=False) as f:
    f.write('てすと\nこむ'.encode('shift-jis'))
    f.seek(0)
    fname = f.name

    try:
        result = pandas.read_csv(f, encoding='shift-jis')
        print('read shift-jis')
        print(result)

    except Exception as e:
        print(e)
        print(traceback.format_exc())

os.unlink(fname)

Problem description

Pandas 1.0.1, this sample does not work. But pandas 0.25.3, this sample works fine. As stated in issue #31575, the encode of file-like object is ignored when its class is not io.BufferedIOBase neither RawIOBase. However, some file-like objects are NOT inherited one of them, although the “actual” inner object is one of them. In this code sample case, according to the cpython implementation, they has file as their attribute self.file = file, and __getattr__() returns the file’s attribute as their attribute. So the code is not work. The identic problems are in other file-like objects, for example, tempfile.*File class, werkzeug’s FileStorage class, and so on.

Note: I first recognized this problem with using pandas via flask’s posted file. The file-like object is an instance of werkzeug’s FileStorage. I avoided this problem with following code:

pandas.read_csv(request.files['file'].stream._file, encoding='shift-jis')

Expected Output

read shift-jis
  てすと
0  こむ

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 4.14.138-89.102.amzn1.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : ja_JP.UTF-8 LOCALE : ja_JP.UTF-8

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 9.0.3 setuptools : 36.2.7 Cython : None pytest : 3.6.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.0.5 lxml.etree : 4.2.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10 IPython : None pandas_datareader: None bs4 : 4.6.0 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.2.1 matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 3.6.2 pyxlsb : None s3fs : None scipy : None sqlalchemy : 1.3.4 tables : None tabulate : None xarray : None xlrd : 1.1.0 xlwt : None xlsxwriter : 1.0.5 numba : None

Issue Analytics

State:
Created 4 years ago
Comments:22 (15 by maintainers)

Top GitHub Comments

1reaction

davidismcommented, Jan 15, 2021

I’m not sure why @Colin-b didn’t follow up here, but I think this indicates an issue with Pandas as well. Pandas should accept w+ as readable. I don’t know enough about Pandas to say though, so I’m not opening a new issue.

1reaction

gfyoungcommented, Feb 11, 2020

@sasanquaneuf : You are more than welcome to give your suggestion a try!

Top Results From Across the Web

pandas.read_csv — pandas 1.0.0 documentation

By file-like object, we refer to objects with a read() method, such as a file handler (e.g. via builtin open function) or StringIO...

Pandas cannot open an Excel (.xlsx) file - Stack Overflow

In your case, the solution is to: make sure you are on a recent version of pandas, at least 1.0.1, and preferably the...

How do you write the Pandas series to CSV? - Quora

First you load your data from the CSV (Comma Separated Values) file using the aptly named pd.read_csv() method. (I'm assuming that you import...

Pandas: How to Read and Write Files - Real Python

In this tutorial, you'll learn about the Pandas IO tools API and how you can use it to read and write files. You'll...

Changelog - Dask documentation

Avoid adding data.h5 and mydask.html files during tests (GH#9726) Thomas ... Add std() support for datetime64 dtype for pandas-like objects (GH#8523) Ben ...