s3.open() doesn't seem to understand user defined encoding
See original GitHub issueI have an utf-8 encoded .csv file with user defined data that I’m trying to upload to my s3 bucket.
My script is like this. It works perfectly with a dummy dataframe, and it also works if I try to save my data locally with in-built open()
function, but with my .csv it breaks at the last line:
# [..] other stuff
s3 = s3fs.S3FileSystem(session=session)
with s3.open('my_bucket/my_file.csv', 'w', encoding='utf-8') as output:
data.to_csv(output, index=False, encoding='utf-8')
and here’s the traceback:
Traceback (most recent call last):
File "s3_csv_gz.py", line 35, in <module>
upload(file_path)
File "s3_csv_gz.py", line 30, in upload
data.to_csv(output, index=False, encoding='utf-8')
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 3020, in to_csv
formatter.save()
File "C:\Anaconda\lib\site-packages\pandas\io\formats\csvs.py", line 172, in save
self._save()
File "C:\Anaconda\lib\site-packages\pandas\io\formats\csvs.py", line 288, in _save
self._save_chunk(start_i, end_i)
File "C:\Anaconda\lib\site-packages\pandas\io\formats\csvs.py", line 315, in _save_chunk
self.cols, self.writer)
File "pandas/_libs/writers.pyx", line 72, in pandas._libs.writers.write_csv_rows
File "C:\Anaconda\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 14-22: character maps to <undefined>
As you can see from the last call, for some reason the module called is cp1252.py
, instead of utf-8.py
. Now, I’m not 100% sure if that’s how it’s supposed to work, but I’m quite certain that cp1252 has nothing to do with utf-8.
Is there a way to circumvent this? I’d really love to use this package instead of boto3 to upload my files, but I can’t seem to make it work.
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
s3.open() doesn't seem to understand user defined encoding
I have an utf-8 encoded .csv file with user defined data that I'm trying to upload to my s3 bucket. My script is...
Read more >GetObject - Amazon Simple Storage Service
Retrieves objects from Amazon S3. To use GET , you must have READ access to the object. If you grant READ access to...
Read more >How to import a text file on AWS S3 into pandas without ...
How to import a text file on AWS S3 into pandas without writing to disk · try it this way: io.BytesIO(file) or io.StringIO(file)...
Read more >Resolve "AWS Access Key Id" errors when sending requests ...
I want to access my Amazon Simple Storage Service (Amazon S3) bucket using the AWS Command Line Interface (AWS CLI), an AWS SDK,...
Read more >How can I fix the UTF-8 error when bulk uploading users?
The file should now be in UTF-8 encoding, and it will successfully upload. If you use Microsoft Excel. Open your CSV file in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ah, so there was indeed a bug, fixed in the linked PR. In way you are doing it now with ‘wb’, the file is open in binary mode, and I suppose pandas is doing the right thing in dealing with that.
I was surprised as well, apparently the compression only works when passing a file path as argument and not a file object, for some magical reason. A quick google search shows that it’s been already noticed by the community but never addressed.