Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HDF5 file reading error: "Attribute 'block2_items_variety' does not exist in node:/data"

See original GitHub issue

Code Sample, a copy-pastable example if possible

# Your code here
data.to_hdf('data.h5',mode='w',format='f')

data = pd.read_hdf('data.h5')
####Traceback

AttributeError                            Traceback (most recent call last)
<ipython-input-5-13fe6d5eccfa> in <module>()
----> 1 ndata = pd.read_hdf('cleaned_data_v1.h5')

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
    368                                      'contains multiple datasets.')
    369             key = candidate_only_group._v_pathname
--> 370         return store.select(key, auto_close=auto_close, **kwargs)
    371     except:
    372         # if there is an error, close the store

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
    715                            chunksize=chunksize, auto_close=auto_close)
    716 
--> 717         return it.get_result()
    718 
    719     def select_as_coordinates(

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
   1455 
   1456         # directly return the result
-> 1457         results = self.func(self.start, self.stop, where)
   1458         self.close()
   1459         return results

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
    708             return s.read(start=_start, stop=_stop,
    709                           where=_where,
--> 710                           columns=columns, **kwargs)
    711 
    712         # create the iterator

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in read(self, start, stop, **kwargs)
   2893         for i in range(self.nblocks):
   2894 
-> 2895             blk_items = self.read_index('block%d_items' % i)
   2896             values = self.read_array('block%d_values' % i,
   2897                                      start=_start, stop=_stop)

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in read_index(self, key, **kwargs)
   2475 
   2476     def read_index(self, key, **kwargs):
-> 2477         variety = _ensure_decoded(getattr(self.attrs, '%s_variety' % key))
   2478 
   2479         if variety == u('multi'):

/opt/conda/lib/python3.6/site-packages/tables/attributeset.py in __getattr__(self, name)
    290         if not name in self._v_attrnames:
    291             raise AttributeError("Attribute '%s' does not exist in node: "
--> 292                                  "'%s'" % (name, self._v__nodepath))
    293 
    294         # Read the attribute from disk. This is an optimization to read

AttributeError: Attribute 'block2_items_variety' does not exist in node: '/cleaned_data'

#### Problem description
We are working remotely on a Jupyter notebook on a server doing some 
data preprocessing, having loaded an unprocessed .h5 file that was created 
using the pd.to_hdf() command we then did a number of preprocessing operations
 and added some new columns to dataframe.

Next we saved the data using data.to_hdf command and generated the file data.h5. after 
that we used the command read_hdf to read the file but we get the error:
 "Attribute 'block2_items_variety' does not exist in node: /cleaned_data".


#### Expected Output
reading the file successfully.

#### Output of ``pd.show_versions()``


[paste the output of ``pd.show_versions()`` here below this line]
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.7-docker-4
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.3.1
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Comments:12 (3 by maintainers)

Top GitHub Comments

2reactions

AiAlexcommented, Nov 13, 2019

Got this error AttributeError: Attribute 'block1_items_variety' does not exist in node: '/dataset'

on Pandas 0.25.3 Python 3.6.3 :: Anaconda, Inc.

2reactions

JanKalincommented, Jul 4, 2018

The problem apparently only occurs on 32-bit Python. With 64-bit running inside a 64-bit virtual machine the conversion goes well:

Y:\>c:\Users\jank\test\dws2hdf5.py Ravbarkomanda_2018_04_04_132938.dxd --fields
1 @rename.opt
2018-07-04 09:53:22.973000 1/1 Ravbarkomanda_2018_04_04_132938.dxd, a_2018_04_04
_13_29_40_110000, T_2018_04_04_13_29_40_110000, done

Y:\>python
Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.1
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 5.7.0
sphinx: 1.7.5
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.5
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None