Appending parquet file from python to s3
See original GitHub issueHere is my snippet in spark-shell
jdbcDF.write.mode("append").partitionBy("date").parquet("s3://bucket/Data/")
Problem description
Now, i am trying to do the same thing in python with fastparquet.
import s3fs
from fastparquet import write
s3 = s3fs.S3FileSystem()
myopen = s3.open
write('****/20180101.parq', data, compression='GZIP', open_with=myopen)
First thing, I tried to save as snappy compression,
write('****/20180101.snappy.parquet', data, compression='SNAPPY', open_with=myopen)
but got error,
Compression ‘SNAPPY’ not available. Options: [‘GZIP’, ‘UNCOMPRESSED’]
Then, tried to use GZIP, it worked, but not sure how I can append or create partition here. Here is an issue I created in pandas. https://github.com/pandas-dev/pandas/issues/20638
Thanks.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Python Example to generates Parquet file for S3 Integration
append "; Resulted parquet file can be copied into the S3 bucket dedicated for Split S3 event integration. import pandas as pd
Read more >append row to parquet file on AWS s3 without read entire data
I have a parquet file stored in AWS s3 and it is such a large file that I can't read in memory. I...
Read more >Read and Write Parquet file from Amazon S3
Append to existing Parquet file on S3 ... Spark provides the capability to append DataFrame to existing parquet files using “append” save mode....
Read more >How to read the parquet file in data frame from AWS S3
Today we are going to learn How to read the parquet file in data frame from AWS S3 First of all, you have...
Read more >awswrangler.s3.to_parquet - Read the Docs
awswrangler.s3.to_parquet¶ ... Write Parquet file or dataset on Amazon S3. The concept of Dataset goes beyond the simple idea of ordinary files and...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ok. Let me explain. I have this folder structure in s3 -
s3://bucketname/user/data/
. And this is my code to write my partition in it.I am running this in Jupyter notebook, when I run this, everything works fine and s3 path looks like this,
bucketname/user/data/date=2018-01-01/part-o.parquet.
However, in my local machine, I have this folder structure created automatically -
bucketname/user/data/date=2018-01-01/
, but no parquet file in it. I am wonder if it is creating a local copy before moving the file to s3.OK, understood. No, the files, are not first created locally and copied.
As documented , you should supply not only the function to open, but also the function to make directories. In the case of s3, there is no such concept as directories, so the function your need to provide should not actually do anything, but you still must provide it to avoid using the default, which makes local directories.