pandas 1.0.1 read_csv() is broken for some file-like objects
See original GitHub issueCode Sample
import os
import pandas
import tempfile
import traceback
# pandas.show_versions()
fname = ''
with tempfile.NamedTemporaryFile(delete=False) as f:
f.write('てすと\nこむ'.encode('shift-jis'))
f.seek(0)
fname = f.name
try:
result = pandas.read_csv(f, encoding='shift-jis')
print('read shift-jis')
print(result)
except Exception as e:
print(e)
print(traceback.format_exc())
os.unlink(fname)
Problem description
Pandas 1.0.1, this sample does not work. But pandas 0.25.3, this sample works fine.
As stated in issue #31575, the encode of file-like object is ignored when its class is not io.BufferedIOBase neither RawIOBase.
However, some file-like objects are NOT inherited one of them, although the “actual” inner object is one of them.
In this code sample case, according to the cpython implementation, they has file as their attribute self.file = file
, and __getattr__()
returns the file’s attribute as their attribute.
So the code is not work. The identic problems are in other file-like objects, for example, tempfile.*File class, werkzeug’s FileStorage class, and so on.
Note: I first recognized this problem with using pandas via flask’s posted file. The file-like object is an instance of werkzeug’s FileStorage. I avoided this problem with following code:
pandas.read_csv(request.files['file'].stream._file, encoding='shift-jis')
Expected Output
read shift-jis
てすと
0 こむ
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 4.14.138-89.102.amzn1.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : ja_JP.UTF-8 LOCALE : ja_JP.UTF-8
pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 9.0.3 setuptools : 36.2.7 Cython : None pytest : 3.6.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.0.5 lxml.etree : 4.2.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10 IPython : None pandas_datareader: None bs4 : 4.6.0 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.2.1 matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 3.6.2 pyxlsb : None s3fs : None scipy : None sqlalchemy : 1.3.4 tables : None tabulate : None xarray : None xlrd : 1.1.0 xlwt : None xlsxwriter : 1.0.5 numba : None
Issue Analytics
- State:
- Created 4 years ago
- Comments:22 (15 by maintainers)
Top GitHub Comments
I’m not sure why @Colin-b didn’t follow up here, but I think this indicates an issue with Pandas as well. Pandas should accept
w+
as readable. I don’t know enough about Pandas to say though, so I’m not opening a new issue.@sasanquaneuf : You are more than welcome to give your suggestion a try!