Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thrift parameter validation failure i32

See original GitHub issue

Trying to convert a dataframe to parquet and keep receiving the following error. Not sure if it’s a bug but I can’t find more info about this problem.

ParquetException                          Traceback (most recent call last)
<ipython-input-7-d79cbcc89268> in <module>()
      1 from fastparquet import write
      2 
----> 3 write('parq-bags.parq', df, file_scheme='hive')

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write(filename, data, row_group_offsets, compression, file_scheme, open_with, mkdirs, has_nulls, write_index, partition_on, fixed_text, append, object_encoding, times)
    779                 with open_with(partname, 'wb') as f2:
    780                     rg = make_part_file(f2, data[start:end], fmd.schema,
--> 781                                         compression=compression)
    782                 for chunk in rg.columns:
    783                     chunk.file_path = part

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in make_part_file(f, data, schema, compression)
    578     with f as f:
    579         f.write(MARKER)
--> 580         rg = make_row_group(f, data, schema, compression=compression)
    581         fmd = parquet_thrift.FileMetaData(num_rows=len(data),
    582                                           schema=schema,

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in make_row_group(f, data, schema, compression)
    566                 comp = compression
    567             chunk = write_column(f, data[column.name], column,
--> 568                                  compression=comp)
    569             rg.columns.append(chunk)
    570     rg.total_byte_size = sum([c.meta_data.total_uncompressed_size for c in

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write_column(f, data, selement, compression, object_encoding)
    510                                    data_page_header=dph, crc=None)
    511 
--> 512     write_thrift(f, ph)
    513     f.write(bdata)
    514 

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write_thrift(fobj, thrift)
    237         raise ParquetException('Thrift parameter validation failure %s'
    238                                ' when writing: %s-> Field: %s' % (
--> 239             val.args[0], obj, name
    240         ))
    241     return fobj.tell() - t0

ParquetException: Thrift parameter validation failure i32 requires -2147483648 <= number <= 2147483647 when writing: <class 'parquet_thrift.PageHeader'>
compressed_page_size: 7750823503
crc: None
data_page_header: <class 'parquet_thrift.DataPageHeader'>
  definition_level_encoding: 3
  encoding: 0
  num_values: 13862
  repetition_level_encoding: 4
  statistics: None

data_page_header_v2: None
dictionary_page_header: None
index_page_header: None
type: 0
uncompressed_page_size: 7750823503
-> Field: uncompressed_page_size

Issue Analytics

State:
Created 6 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

martindurantcommented, Apr 5, 2017

Efficient storage of arrays is an ongoing conversation for us. HDF5 and netCDF (which are very similar) are common formats for massive ndarrays, where you might store your images as slices of a larger cube, or separate entities within the dataset. HDF5 is the standard recommendation in most scientific python literature -

Other options include bcolz’s ctable (potentially much better compressions and process filters) and zarr which is similarly efficient, also offers chunking, but is far less well-known.

1reaction

mrocklincommented, Apr 5, 2017

HDF5 might be a nice fit for data with ndarray elements

On Tue, Apr 4, 2017 at 10:09 PM, William Markito notifications@github.com wrote:

Out of curiosity, anything in particular that you had in mind for binary blobs ? I was considering to store images in numpy arrays since that’s the format that we would be processing most of the time anyway and thought that parquet would be a good fit for it.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/fastparquet/issues/124#issuecomment-291716245, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKShw3d8JK4tAf-9jf8KzvLdoC8vks5rsvfugaJpZM4MzEfO .

Top Results From Across the Web

web services - Does Apache Thrift validate parameters?

It basically depends on what kind of validation we are talking. If the parameters need to be set, you could make them "required"....

Thrift Validator

Validator is a thrift plugin that supports struct validation. Constraints are described by annotations in IDL, and the plugin will generate the IsValid() ......

Thrift Validation — Scrooge 22.7.0 documentation

Thrift Validation is a library and Scrooge compiler integration that validates Thrift requests for Scrooge generated Finagle services. It provides a core set ......

thrift-gen-validator is a thriftgo plugin to generate struct ...

thrift -gen-validator is a thriftgo plugin to generate struct validators. Users can define validation rule for struct-like(struct/union/exception) in Thrift ...

Thrift module: Errors

Clients should be validating and normalizing contacts, so receiving this error code commonly represents a client error. DUPLICATE_CONTACT: If the method ...