read_hdf throws UnicodeDecodeError with Python 3.5 and 3.6 but not with Python 2.7
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
df = pd.read_hdf('data.h5')
Problem description
The HDF5 dataset was created with pandas
, to_hdf
in Python 2.7 and can be read in by Python 2.7. When I try to read it in with Python 3.5 or Python 3.6, I get the following:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-2-53006689fd2c> in <module>()
----> 1 df = pd.read_hdf(data.h5')
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, **kwargs)
356 'contains multiple datasets.')
357 key = candidate_only_group._v_pathname
--> 358 return store.select(key, auto_close=auto_close, **kwargs)
359 except:
360 # if there is an error, close the store
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
720 chunksize=chunksize, auto_close=auto_close)
721
--> 722 return it.get_result()
723
724 def select_as_coordinates(
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
1426
1427 # directly return the result
-> 1428 results = self.func(self.start, self.stop, where)
1429 self.close()
1430 return results
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
713 return s.read(start=_start, stop=_stop,
714 where=_where,
--> 715 columns=columns, **kwargs)
716
717 # create the iterator
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read(self, start, stop, **kwargs)
2864 blk_items = self.read_index('block%d_items' % i)
2865 values = self.read_array('block%d_values' % i,
-> 2866 start=_start, stop=_stop)
2867 blk = make_block(values,
2868 placement=items.get_indexer(blk_items))
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop)
2413 import tables
2414 node = getattr(self.group, key)
-> 2415 data = node[start:stop]
2416 attrs = node._v_attrs
2417
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in __getitem__(self, key)
673 start, stop, step = self._process_range(
674 key.start, key.stop, key.step)
--> 675 return self.read(start, stop, step)
676 # Try with a boolean or point selection
677 elif type(key) in (list, tuple) or isinstance(key, numpy.ndarray):
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in read(self, start, stop, step)
813 atom = self.atom
814 if not hasattr(atom, 'size'): # it is a pseudo-atom
--> 815 outlistarr = [atom.fromarray(arr) for arr in listarr]
816 else:
817 # Convert the list to the right flavor
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/vlarray.py in <listcomp>(.0)
813 atom = self.atom
814 if not hasattr(atom, 'size'): # it is a pseudo-atom
--> 815 outlistarr = [atom.fromarray(arr) for arr in listarr]
816 else:
817 # Convert the list to the right flavor
/home/tct/anaconda2/envs/py36/lib/python3.6/site-packages/tables/atom.py in fromarray(self, array)
1226 if array.size == 0:
1227 return None
-> 1228 return six.moves.cPickle.loads(array.tostring())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)
Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!
Note: Many problems can be resolved by simply upgrading pandas
to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master
addresses this issue, but that is not necessary.
For documentation-related issues, you can check the latest versions of the docs on master
here:
https://pandas-docs.github.io/pandas-docs-travis/
If the issue has not been resolved there, go ahead and file it in the issue tracker.
Expected Output
In [1]: import pandas as pd
In [2]: df = pd.read_hdf('data.h5')
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Reactions:3
- Comments:17 (7 by maintainers)
Top GitHub Comments
Just a postscript.
format='table'
only works for a single column of data. When trying to save the entire dataset in Python 2.7,when saving using
encoding='utf-8'
the file is saved but again cannot be read in 3.x.TypeError: lookup() argument must be str, not numpy.bytes_
Hi, I met a similar issue. The dataframe was saved in Python 2.7 with
format ='table', encoding ='utf-8'
. However, when I read it in Python 3.7 by pd.read_hdf(‘xxx.hdf’, key=‘xx’,encoding = ‘utf-8’). The error shows like:lookup() argument must be str, not numpy.bytes_