question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pandas 1.0.1 read_csv() is broken for some file-like objects

See original GitHub issue

Code Sample

import os
import pandas
import tempfile
import traceback

# pandas.show_versions()

fname = ''
with tempfile.NamedTemporaryFile(delete=False) as f:
    f.write('てすと\nこむ'.encode('shift-jis'))
    f.seek(0)
    fname = f.name

    try:
        result = pandas.read_csv(f, encoding='shift-jis')
        print('read shift-jis')
        print(result)

    except Exception as e:
        print(e)
        print(traceback.format_exc())

os.unlink(fname)

Problem description

Pandas 1.0.1, this sample does not work. But pandas 0.25.3, this sample works fine. As stated in issue #31575, the encode of file-like object is ignored when its class is not io.BufferedIOBase neither RawIOBase. However, some file-like objects are NOT inherited one of them, although the “actual” inner object is one of them. In this code sample case, according to the cpython implementation, they has file as their attribute self.file = file, and __getattr__() returns the file’s attribute as their attribute. So the code is not work. The identic problems are in other file-like objects, for example, tempfile.*File class, werkzeug’s FileStorage class, and so on.

Note: I first recognized this problem with using pandas via flask’s posted file. The file-like object is an instance of werkzeug’s FileStorage. I avoided this problem with following code:

pandas.read_csv(request.files['file'].stream._file, encoding='shift-jis')

Expected Output

read shift-jis
  てすと
0  こむ

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 4.14.138-89.102.amzn1.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : ja_JP.UTF-8 LOCALE : ja_JP.UTF-8

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 9.0.3 setuptools : 36.2.7 Cython : None pytest : 3.6.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.0.5 lxml.etree : 4.2.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10 IPython : None pandas_datareader: None bs4 : 4.6.0 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.2.1 matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 3.6.2 pyxlsb : None s3fs : None scipy : None sqlalchemy : 1.3.4 tables : None tabulate : None xarray : None xlrd : 1.1.0 xlwt : None xlsxwriter : 1.0.5 numba : None

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:22 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
davidismcommented, Jan 15, 2021

I’m not sure why @Colin-b didn’t follow up here, but I think this indicates an issue with Pandas as well. Pandas should accept w+ as readable. I don’t know enough about Pandas to say though, so I’m not opening a new issue.

1reaction
gfyoungcommented, Feb 11, 2020

@sasanquaneuf : You are more than welcome to give your suggestion a try!

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.read_csv — pandas 1.0.0 documentation
By file-like object, we refer to objects with a read() method, such as a file handler (e.g. via builtin open function) or StringIO...
Read more >
Pandas cannot open an Excel (.xlsx) file - Stack Overflow
In your case, the solution is to: make sure you are on a recent version of pandas, at least 1.0.1, and preferably the...
Read more >
How do you write the Pandas series to CSV? - Quora
First you load your data from the CSV (Comma Separated Values) file using the aptly named pd.read_csv() method. (I'm assuming that you import...
Read more >
Pandas: How to Read and Write Files - Real Python
In this tutorial, you'll learn about the Pandas IO tools API and how you can use it to read and write files. You'll...
Read more >
Changelog - Dask documentation
Avoid adding data.h5 and mydask.html files during tests (GH#9726) Thomas ... Add std() support for datetime64 dtype for pandas-like objects (GH#8523) Ben ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found