pandas/io/feather_format.py should call use_threads instead of nthreads to prevent breakage in pyarrow 0.11.0
See original GitHub issueCode Sample
d = {'one' : [1., 2., 3., 4.],
'two' : [4., 3., 2., 1.]}
df = pandas.DataFrame(d)
df.to_feather('example.feather')
# with pyarrow 0.10.0 this succeeds with a deprecation warning
# with pyarrow 0.11.0 this errors with a TypeError: unexpected argument 'nthreads'
df = pandas.read_feather('example.feather')
# attempt to manually set nthreads results in TypeError: unexpectect argument 'nthreads'
df = pandas.read_feather('example.feather', nthreads=4)
# attempt to pass 'use_threads' results in TypeError: unexpected argument 'nthreads'
df = pandas.read_feather('example.feather', use_threads=True)
Problem description
Pandas introduced nthreads for reading feather files in issue 16359
With PyArrow 0.10.0 a deprecation warning is shown from this source: “nthreads
argument is deprecated, pass use_threads
instead”
When PyArrow version 0.11.0, Python errors with: TypeError: read_feather() got an unexpected keyword argument ‘nthreads’.
I’ve searched with ‘pyarrow’ and ‘nthreads’ keywords and didn’t see this issue posted.
Specifically feather-format.py line 112 should be changed to
return feather.read_dataframe(path, use_threads=True)
or changing the method signature to all overriding use_threads:
return feather.read_dataframe(path, use_threads=use_threads)
I will submit a PR if the only barrier to fix is code effort.
Expected Output
I expect no error output upon running pandas.read_feather() with PyArrow 0.11.0
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.3.0
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.10.0
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:11 (6 by maintainers)
Top GitHub Comments
Work-around might be useful to some people:
Next version of pandas. Aiming to have it out by the end of the year.
On Tue, Dec 4, 2018 at 12:21 PM Richard Anderson notifications@github.com wrote: