question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FR: Accept a file-like object in addition to a path in `fastparquet.write`

See original GitHub issue

pyarrow accepts either a filename or a file-like object when writing a DataFrame. It would be very helpful if fastparquet also does this. I found https://github.com/dask/fastparquet/issues/215 which adds a special in-memory path name, but that seems specific to the dask-ecosystem.

Currently google-cloud-bigquery requires pyarrow rather than fastparquet to upload a DataFrame to BigQuery due to this issue.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

10reactions
martindurantcommented, Apr 25, 2019

Basically, I think its better to deal with file-systems than file-like objects or path things.

0reactions
ospikovetscommented, Apr 25, 2022

@martindurant a code snippet in this stackoverflow comment is a good example of the use case for supporting a file-like object when writing a DataFrame.

def dataframe_to_s3(s3_client, input_datafame, bucket_name, filepath):
    out_buffer = BytesIO()
    input_datafame.to_parquet(out_buffer, index=False)
    s3_client.put_object(Bucket=bucket_name, Key=filepath, Body=out_buffer.getvalue())

It is definitely possible to do what you suggest here, it just doesn’t look as “elegant” as simply passing the file-like.

Any chance fastparquet will support that in the future?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for fastparquet.writer
If None, object column will remain variable length. object_encoding: None or ... When called with a f(path, mode), returns an open file-like object....
Read more >
File-like object for pandas dataframe to parquet - Stack Overflow
Passing no path to to_parquet will just create the bytes for you, allowing you to send it with a request without using file-like...
Read more >
fastparquet Documentation - Read the Docs
This package aims to provide a performant library to read and write Parquet files from Python, without any need for a Python-. Java...
Read more >
pandas.read_parquet — pandas 2.0.0.dev0+954 ...
Load a parquet object from the file path, returning a DataFrame. Parameters. pathstr, path object or file-like object. String, path object (implementing os....
Read more >
Reading and Writing the Apache Parquet Format
A file path as a string. A NativeFile from PyArrow. A Python file object. In general, a Python file object will have the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found