question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thrift parameter validation failure i32

See original GitHub issue

Trying to convert a dataframe to parquet and keep receiving the following error. Not sure if it’s a bug but I can’t find more info about this problem.

ParquetException                          Traceback (most recent call last)
<ipython-input-7-d79cbcc89268> in <module>()
      1 from fastparquet import write
      2 
----> 3 write('parq-bags.parq', df, file_scheme='hive')

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write(filename, data, row_group_offsets, compression, file_scheme, open_with, mkdirs, has_nulls, write_index, partition_on, fixed_text, append, object_encoding, times)
    779                 with open_with(partname, 'wb') as f2:
    780                     rg = make_part_file(f2, data[start:end], fmd.schema,
--> 781                                         compression=compression)
    782                 for chunk in rg.columns:
    783                     chunk.file_path = part

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in make_part_file(f, data, schema, compression)
    578     with f as f:
    579         f.write(MARKER)
--> 580         rg = make_row_group(f, data, schema, compression=compression)
    581         fmd = parquet_thrift.FileMetaData(num_rows=len(data),
    582                                           schema=schema,

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in make_row_group(f, data, schema, compression)
    566                 comp = compression
    567             chunk = write_column(f, data[column.name], column,
--> 568                                  compression=comp)
    569             rg.columns.append(chunk)
    570     rg.total_byte_size = sum([c.meta_data.total_uncompressed_size for c in

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write_column(f, data, selement, compression, object_encoding)
    510                                    data_page_header=dph, crc=None)
    511 
--> 512     write_thrift(f, ph)
    513     f.write(bdata)
    514 

/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write_thrift(fobj, thrift)
    237         raise ParquetException('Thrift parameter validation failure %s'
    238                                ' when writing: %s-> Field: %s' % (
--> 239             val.args[0], obj, name
    240         ))
    241     return fobj.tell() - t0

ParquetException: Thrift parameter validation failure i32 requires -2147483648 <= number <= 2147483647 when writing: <class 'parquet_thrift.PageHeader'>
compressed_page_size: 7750823503
crc: None
data_page_header: <class 'parquet_thrift.DataPageHeader'>
  definition_level_encoding: 3
  encoding: 0
  num_values: 13862
  repetition_level_encoding: 4
  statistics: None

data_page_header_v2: None
dictionary_page_header: None
index_page_header: None
type: 0
uncompressed_page_size: 7750823503
-> Field: uncompressed_page_size

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Apr 5, 2017

Efficient storage of arrays is an ongoing conversation for us. HDF5 and netCDF (which are very similar) are common formats for massive ndarrays, where you might store your images as slices of a larger cube, or separate entities within the dataset. HDF5 is the standard recommendation in most scientific python literature -

Other options include bcolz’s ctable (potentially much better compressions and process filters) and zarr which is similarly efficient, also offers chunking, but is far less well-known.

1reaction
mrocklincommented, Apr 5, 2017

HDF5 might be a nice fit for data with ndarray elements

On Tue, Apr 4, 2017 at 10:09 PM, William Markito notifications@github.com wrote:

Out of curiosity, anything in particular that you had in mind for binary blobs ? I was considering to store images in numpy arrays since that’s the format that we would be processing most of the time anyway and thought that parquet would be a good fit for it.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/fastparquet/issues/124#issuecomment-291716245, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKShw3d8JK4tAf-9jf8KzvLdoC8vks5rsvfugaJpZM4MzEfO .

Read more comments on GitHub >

github_iconTop Results From Across the Web

web services - Does Apache Thrift validate parameters?
It basically depends on what kind of validation we are talking. If the parameters need to be set, you could make them "required"....
Read more >
Thrift Validator
Validator is a thrift plugin that supports struct validation. Constraints are described by annotations in IDL, and the plugin will generate the IsValid() ......
Read more >
Thrift Validation — Scrooge 22.7.0 documentation
Thrift Validation is a library and Scrooge compiler integration that validates Thrift requests for Scrooge generated Finagle services. It provides a core set ......
Read more >
thrift-gen-validator is a thriftgo plugin to generate struct ...
thrift -gen-validator is a thriftgo plugin to generate struct validators. Users can define validation rule for struct-like(struct/union/exception) in Thrift ...
Read more >
Thrift module: Errors
Clients should be validating and normalizing contacts, so receiving this error code commonly represents a client error. DUPLICATE_CONTACT: If the method ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found