question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HDF5 file reading error: "Attribute 'block2_items_variety' does not exist in node:/data"

See original GitHub issue

Code Sample, a copy-pastable example if possible

# Your code here
data.to_hdf('data.h5',mode='w',format='f')

data = pd.read_hdf('data.h5')
####Traceback

AttributeError                            Traceback (most recent call last)
<ipython-input-5-13fe6d5eccfa> in <module>()
----> 1 ndata = pd.read_hdf('cleaned_data_v1.h5')

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
    368                                      'contains multiple datasets.')
    369             key = candidate_only_group._v_pathname
--> 370         return store.select(key, auto_close=auto_close, **kwargs)
    371     except:
    372         # if there is an error, close the store

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
    715                            chunksize=chunksize, auto_close=auto_close)
    716 
--> 717         return it.get_result()
    718 
    719     def select_as_coordinates(

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
   1455 
   1456         # directly return the result
-> 1457         results = self.func(self.start, self.stop, where)
   1458         self.close()
   1459         return results

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
    708             return s.read(start=_start, stop=_stop,
    709                           where=_where,
--> 710                           columns=columns, **kwargs)
    711 
    712         # create the iterator

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in read(self, start, stop, **kwargs)
   2893         for i in range(self.nblocks):
   2894 
-> 2895             blk_items = self.read_index('block%d_items' % i)
   2896             values = self.read_array('block%d_values' % i,
   2897                                      start=_start, stop=_stop)

/opt/conda/lib/python3.6/site-packages/pandas/io/pytables.py in read_index(self, key, **kwargs)
   2475 
   2476     def read_index(self, key, **kwargs):
-> 2477         variety = _ensure_decoded(getattr(self.attrs, '%s_variety' % key))
   2478 
   2479         if variety == u('multi'):

/opt/conda/lib/python3.6/site-packages/tables/attributeset.py in __getattr__(self, name)
    290         if not name in self._v_attrnames:
    291             raise AttributeError("Attribute '%s' does not exist in node: "
--> 292                                  "'%s'" % (name, self._v__nodepath))
    293 
    294         # Read the attribute from disk. This is an optimization to read

AttributeError: Attribute 'block2_items_variety' does not exist in node: '/cleaned_data'

#### Problem description
We are working remotely on a Jupyter notebook on a server doing some 
data preprocessing, having loaded an unprocessed .h5 file that was created 
using the pd.to_hdf() command we then did a number of preprocessing operations
 and added some new columns to dataframe.

Next we saved the data using data.to_hdf command and generated the file data.h5. after 
that we used the command read_hdf to read the file but we get the error:
 "Attribute 'block2_items_variety' does not exist in node: /cleaned_data".


#### Expected Output
reading the file successfully.

#### Output of ``pd.show_versions()``


[paste the output of ``pd.show_versions()`` here below this line]
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.7-docker-4
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.3.1
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
AiAlexcommented, Nov 13, 2019

Got this error AttributeError: Attribute 'block1_items_variety' does not exist in node: '/dataset'

on Pandas 0.25.3 Python 3.6.3 :: Anaconda, Inc.

2reactions
JanKalincommented, Jul 4, 2018

The problem apparently only occurs on 32-bit Python. With 64-bit running inside a 64-bit virtual machine the conversion goes well:

Y:\>c:\Users\jank\test\dws2hdf5.py Ravbarkomanda_2018_04_04_132938.dxd --fields
1 @rename.opt
2018-07-04 09:53:22.973000 1/1 Ravbarkomanda_2018_04_04_132938.dxd, a_2018_04_04
_13_29_40_110000, T_2018_04_04_13_29_40_110000, done

Y:\>python
Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.1
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 5.7.0
sphinx: 1.7.5
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.5
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to solve no such node error in pytables and h5py
Before accessing the dataset (node), add a test to confirm it exists. While you're adding checks, do the same for the attribute 'TITLE'...
Read more >
(Not recommended) Read HDF5 file - MATLAB hdf5read
This MATLAB function reads all the data from the dataset ds contained in the HDF5 file filename.
Read more >
rhdf5 - HDF5 interface for R - Bioconductor
If a dataset with the given name does not yet exist, a dataset is created in the HDF5 file and the object obj...
Read more >
HDF5/Tools API Specification - The HDF Group
h5import -- Imports data into an existing or new HDF5 file. ... Note: It is not permissible to specify multiple attributes, datasets, datatypes,...
Read more >
File Objects — h5py 3.7.0 documentation
HDF5 files work generally like standard Python file objects. ... Using this with an existing file and a reading mode will read the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found