Cannot write simple dataframe to disk in thrift 0.11.0
See original GitHub issueIs there something simple I’m missing here? I’m just trying to do the most basic thing in the example:
df = pd.DataFrame(np.zeros((1000,1000)), columns=[str(i) for i in range(1000)])
from fastparquet import write
write('outfile2.parq', df)
write('outfile2.parq', df, row_group_offsets=[0, 10000, 20000],
compression='GZIP', file_scheme='hive')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-5b2fbc3e1a9e> in <module>()
1 write('outfile2.parq', df, row_group_offsets=[0, 10000, 20000],
----> 2 compression='GZIP', file_scheme='hive')
3
~/miniconda3/envs/py3default/lib/python3.6/site-packages/fastparquet/writer.py in write(filename, data, row_group_offsets, compression, file_scheme, open_with, mkdirs, has_nulls, write_index, partition_on, fixed_text, append, object_encoding, times)
831 with open_with(partname, 'wb') as f2:
832 rg = make_part_file(f2, data[start:end], fmd.schema,
--> 833 compression=compression, fmd=fmd)
834 for chunk in rg.columns:
835 chunk.file_path = part
~/miniconda3/envs/py3default/lib/python3.6/site-packages/fastparquet/writer.py in make_part_file(f, data, schema, compression, fmd)
604 with f as f:
605 f.write(MARKER)
--> 606 rg = make_row_group(f, data, schema, compression=compression)
607 if fmd is None:
608 fmd = parquet_thrift.FileMetaData(num_rows=len(data),
~/miniconda3/envs/py3default/lib/python3.6/site-packages/fastparquet/writer.py in make_row_group(f, data, schema, compression)
592 comp = compression
593 chunk = write_column(f, data[column.name], column,
--> 594 compression=comp)
595 rg.columns.append(chunk)
596 rg.total_byte_size = sum([c.meta_data.total_uncompressed_size for c in
~/miniconda3/envs/py3default/lib/python3.6/site-packages/fastparquet/writer.py in write_column(f, data, selement, compression)
532 data_page_header=dph, crc=None)
533
--> 534 write_thrift(f, ph)
535 f.write(bdata)
536
~/miniconda3/envs/py3default/lib/python3.6/site-packages/fastparquet/thrift_structures.py in write_thrift(fobj, thrift)
47 pout = TCompactProtocol(fobj)
48 try:
---> 49 thrift.write(pout)
50 fail = False
51 except TProtocolException as e:
~/miniconda3/envs/py3default/lib/python3.6/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py in write(self, oprot)
1027 def write(self, oprot):
1028 if oprot._fast_encode is not None and self.thrift_spec is not None:
-> 1029 oprot.trans.write(oprot._fast_encode(self, (self.__class__, self.thrift_spec)))
1030 return
1031 oprot.writeStructBegin('PageHeader')
TypeError: expecting list of size 2 for struct args
Same error on my local Mac and remote EC2 ubuntu 16.04 instance
Issue Analytics
- State:
- Created 6 years ago
- Comments:12 (7 by maintainers)
Top Results From Across the Web
Writing data to Kafka | CDP Public Cloud
Writing data to Kafka. You can extract, transform, and load a Hive table to a Kafka topic for real-time streaming of a large...
Read more >python - How to reversibly store and load a Pandas dataframe ...
The easiest way is to pickle it using to_pickle : df.to_pickle(file_name) # where to save it, usually as a .pkl. Then you can...
Read more >All Configurations | Apache Hudi
This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at...
Read more >Apache Arrow 3.0.0 (2021-01-18)
Parquet] Can write a jagged array column of strings to disk, but hit `ArrowNotImplementedError` on read; ARROW-5142 - [CI] Fix conda calls ...
Read more >Packages for 64-bit Windows with Python 3.9
Name Version Summary / License
_libgcc_mutex 0.1 Mutex for libgcc and libgcc‑ng / None
aiofiles 0.7.0 File support for asyncio / Apache 2.0
alembic 1.8.1 A...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
0.11.0
That does seem to be the issue. Installing 0.10.0 fixes it. Maybe update your requirements to force 0.10.0 exactly?
https://github.com/conda-forge/thrift-cpp-feedstock/pull/15