Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TypeError: bad argument type for built-in operation

See original GitHub issue

I am trying to write my first parquet file. I have a very large text file (200,000 rows, ~200MB) that I’ve read into a Pandas dataframe. Shape is (202363, 52).

When calling fastparquet.write('./parquet-logs/out.parq', df, compression='GZIP') I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-e28f2844674b> in <module>()
      1 # write data
----> 2 fastparquet.write('./parquet-logs/out.parq', df, compression='GZIP')

/Users/william/anaconda/envs/parquet-python/lib/python3.6/site-packages/fastparquet/writer.py in write(filename, data, row_group_offsets, compression, file_scheme, open_with, mkdirs, has_nulls, write_index, partition_on, fixed_text, append, object_encoding, times)
    801     if file_scheme == 'simple':
    802         write_simple(filename, data, fmd, row_group_offsets,
--> 803                      compression, open_with, has_nulls, append)
    804     elif file_scheme in ['hive', 'drill']:
    805         if append:

/Users/william/anaconda/envs/parquet-python/lib/python3.6/site-packages/fastparquet/writer.py in write_simple(fn, data, fmd, row_group_offsets, compression, open_with, has_nulls, append)
    701                    else None)
    702             rg = make_row_group(f, data[start:end], fmd.schema,
--> 703                                 compression=compression)
    704             if rg is not None:
    705                 fmd.row_groups.append(rg)

/Users/william/anaconda/envs/parquet-python/lib/python3.6/site-packages/fastparquet/writer.py in make_row_group(f, data, schema, compression)
    599                 comp = compression
    600             chunk = write_column(f, data[column.name], column,
--> 601                                  compression=comp)
    602             rg.columns.append(chunk)
    603     rg.total_byte_size = sum([c.meta_data.total_uncompressed_size for c in

/Users/william/anaconda/envs/parquet-python/lib/python3.6/site-packages/fastparquet/writer.py in write_column(f, data, selement, compression)
    514     start = f.tell()
    515     bdata = definition_data + repetition_data + encode[encoding](
--> 516             data, selement)
    517     bdata += 8 * b'\x00'
    518     try:

/Users/william/anaconda/envs/parquet-python/lib/python3.6/site-packages/fastparquet/writer.py in encode_plain(data, se)
    272 def encode_plain(data, se):
    273     """PLAIN encoding; returns byte representation"""
--> 274     out = convert(data, se)
    275     if se.type == parquet_thrift.Type.BYTE_ARRAY:
    276         return pack_byte_array(list(out))

/Users/william/anaconda/envs/parquet-python/lib/python3.6/site-packages/fastparquet/writer.py in convert(data, se)
    165     elif dtype == "O":
    166         if converted_type == parquet_thrift.ConvertedType.UTF8:
--> 167             out = array_encode_utf8(data)
    168         elif converted_type is None:
    169             if type in revmap:

/Users/william/anaconda/envs/parquet-python/lib/python3.6/site-packages/fastparquet/speedups.pyx in fastparquet.speedups.array_encode_utf8 (fastparquet/speedups.c:2094)()

TypeError: bad argument type for built-in operation

Before I start dividing up the file to find the exact line that’s causing the problem, does anyone recognize this as a known error or what could be causing it?

I am using version 0.0.6 of fastparquet.