Thrift parameter validation failure i32
See original GitHub issueTrying to convert a dataframe to parquet and keep receiving the following error. Not sure if it’s a bug but I can’t find more info about this problem.
ParquetException Traceback (most recent call last)
<ipython-input-7-d79cbcc89268> in <module>()
1 from fastparquet import write
2
----> 3 write('parq-bags.parq', df, file_scheme='hive')
/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write(filename, data, row_group_offsets, compression, file_scheme, open_with, mkdirs, has_nulls, write_index, partition_on, fixed_text, append, object_encoding, times)
779 with open_with(partname, 'wb') as f2:
780 rg = make_part_file(f2, data[start:end], fmd.schema,
--> 781 compression=compression)
782 for chunk in rg.columns:
783 chunk.file_path = part
/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in make_part_file(f, data, schema, compression)
578 with f as f:
579 f.write(MARKER)
--> 580 rg = make_row_group(f, data, schema, compression=compression)
581 fmd = parquet_thrift.FileMetaData(num_rows=len(data),
582 schema=schema,
/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in make_row_group(f, data, schema, compression)
566 comp = compression
567 chunk = write_column(f, data[column.name], column,
--> 568 compression=comp)
569 rg.columns.append(chunk)
570 rg.total_byte_size = sum([c.meta_data.total_uncompressed_size for c in
/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write_column(f, data, selement, compression, object_encoding)
510 data_page_header=dph, crc=None)
511
--> 512 write_thrift(f, ph)
513 f.write(bdata)
514
/home/user/miniconda2/envs/voxel_env/lib/python2.7/site-packages/fastparquet/writer.pyc in write_thrift(fobj, thrift)
237 raise ParquetException('Thrift parameter validation failure %s'
238 ' when writing: %s-> Field: %s' % (
--> 239 val.args[0], obj, name
240 ))
241 return fobj.tell() - t0
ParquetException: Thrift parameter validation failure i32 requires -2147483648 <= number <= 2147483647 when writing: <class 'parquet_thrift.PageHeader'>
compressed_page_size: 7750823503
crc: None
data_page_header: <class 'parquet_thrift.DataPageHeader'>
definition_level_encoding: 3
encoding: 0
num_values: 13862
repetition_level_encoding: 4
statistics: None
data_page_header_v2: None
dictionary_page_header: None
index_page_header: None
type: 0
uncompressed_page_size: 7750823503
-> Field: uncompressed_page_size
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
web services - Does Apache Thrift validate parameters?
It basically depends on what kind of validation we are talking. If the parameters need to be set, you could make them "required"....
Read more >Thrift Validator
Validator is a thrift plugin that supports struct validation. Constraints are described by annotations in IDL, and the plugin will generate the IsValid() ......
Read more >Thrift Validation — Scrooge 22.7.0 documentation
Thrift Validation is a library and Scrooge compiler integration that validates Thrift requests for Scrooge generated Finagle services. It provides a core set ......
Read more >thrift-gen-validator is a thriftgo plugin to generate struct ...
thrift -gen-validator is a thriftgo plugin to generate struct validators. Users can define validation rule for struct-like(struct/union/exception) in Thrift ...
Read more >Thrift module: Errors
Clients should be validating and normalizing contacts, so receiving this error code commonly represents a client error. DUPLICATE_CONTACT: If the method ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Efficient storage of arrays is an ongoing conversation for us. HDF5 and netCDF (which are very similar) are common formats for massive ndarrays, where you might store your images as slices of a larger cube, or separate entities within the dataset. HDF5 is the standard recommendation in most scientific python literature -
Other options include bcolz’s ctable (potentially much better compressions and process filters) and zarr which is similarly efficient, also offers chunking, but is far less well-known.
HDF5 might be a nice fit for data with ndarray elements
On Tue, Apr 4, 2017 at 10:09 PM, William Markito notifications@github.com wrote: