Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: small error in pandas/io/pytables.py

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here


df.to_hdf('file.h5','name', format='table')

~/miniconda3/lib/python3.6/site-packages/pandas/io/pytables.py in _maybe_convert_for_string_atom(name, block, existing_col, min_itemsize, nan_rep, encoding, errors)
   4798         # we cannot serialize this data, so report an exception on a column
   4799         # by column basis
-> 4800         for i in range(len(block.shape[0])):
   4801 
   4802             col = block.iget(i)

TypeError: object of type 'int' has no len()

Problem description

I don’t know why exactly pandas cannot serialize my data frame with a multi-index. But the above error prevents the real error message to be printed.

Expected Output

Error message.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.6.7.final.0 python-bits : 64 OS : Darwin OS-release : 16.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.3 numpy : 1.18.1 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 46.1.3.post20200325 Cython : 0.29.18 pytest : 5.4.1 hypothesis : 5.15.0 sphinx : 3.0.3 blosc : None feather : None xlsxwriter : 1.2.8 lxml.etree : 3.8.0 html5lib : 1.0.1 pymysql : 0.9.3 psycopg2 : None jinja2 : 2.11.2 IPython : 7.14.0 pandas_datareader: None bs4 : 4.9.1 bottleneck : 1.3.2 fastparquet : None gcsfs : None lxml.etree : 3.8.0 matplotlib : 3.2.1 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pytables : None pytest : 5.4.1 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.17 tables : 3.6.1 tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.8 numba : 0.48.0

Issue Analytics

State:
Created 3 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

2reactions

Rmsharks4commented, Jul 26, 2020

Facing this same issue using this piece of code:

import wave

for folder in tqdm(os.listdir(datasets[2])):
    for file in os.listdir(datasets[2]+folder+'/audio/'):
        wav_r = wave.open(datasets[2]+folder+'/audio/'+file, 'rb')
        data = wav_r.readframes(wav_r.getnframes())
        audios_df = pd.DataFrame([{**{'Filename': file, 'Data': data}, **dict(wav_r.getparams()._asdict())}])
        audios_df.to_hdf('amicorpus_audios.h5', key='audios_df', append=True)

Exception:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-e660c3ea770b> in <module>
      7         audios_df = pd.DataFrame([{**{'Filename': file, 'Data': data}, **dict(wav_r.getparams()._asdict())}])
      8         print(audios_df)
----> 9         audios_df.to_hdf('amicorpus_audios.h5', key='audios_df', append=True)

~\anaconda3\lib\site-packages\pandas\core\generic.py in to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
   2503             data_columns=data_columns,
   2504             errors=errors,
-> 2505             encoding=encoding,
   2506         )
   2507 

~\anaconda3\lib\site-packages\pandas\io\pytables.py in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
    280             path_or_buf, mode=mode, complevel=complevel, complib=complib
    281         ) as store:
--> 282             f(store)
    283     else:
    284         f(path_or_buf)

~\anaconda3\lib\site-packages\pandas\io\pytables.py in <lambda>(store)
    259             data_columns=data_columns,
    260             errors=errors,
--> 261             encoding=encoding,
    262         )
    263     else:

~\anaconda3\lib\site-packages\pandas\io\pytables.py in append(self, key, value, format, axes, index, append, complib, complevel, columns, min_itemsize, nan_rep, chunksize, expectedrows, dropna, data_columns, encoding, errors)
   1180             data_columns=data_columns,
   1181             encoding=encoding,
-> 1182             errors=errors,
   1183         )
   1184 

~\anaconda3\lib\site-packages\pandas\io\pytables.py in _write_to_group(self, key, value, format, axes, index, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, nan_rep, data_columns, encoding, errors)
   1707             dropna=dropna,
   1708             nan_rep=nan_rep,
-> 1709             data_columns=data_columns,
   1710         )
   1711 

~\anaconda3\lib\site-packages\pandas\io\pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, nan_rep, data_columns)
   4141             min_itemsize=min_itemsize,
   4142             nan_rep=nan_rep,
-> 4143             data_columns=data_columns,
   4144         )
   4145 

~\anaconda3\lib\site-packages\pandas\io\pytables.py in _create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize)
   3811                 nan_rep=nan_rep,
   3812                 encoding=self.encoding,
-> 3813                 errors=self.errors,
   3814             )
   3815             adj_name = _maybe_adjust_name(new_name, self.version)

~\anaconda3\lib\site-packages\pandas\io\pytables.py in _maybe_convert_for_string_atom(name, block, existing_col, min_itemsize, nan_rep, encoding, errors)
   4798         # we cannot serialize this data, so report an exception on a column
   4799         # by column basis
-> 4800         for i in range(len(block.shape[0])):
   4801 
   4802             col = block.iget(i)

TypeError: object of type 'int' has no len()

Any known workarounds?

0reactions

SilasKcommented, Dec 2, 2020

Sorry, I didn’t have the time to create all the tests. But check if your data has all the same type. In the object type you can store sometimes strings and numbers.