question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

s3.open() doesn't seem to understand user defined encoding

See original GitHub issue

I have an utf-8 encoded .csv file with user defined data that I’m trying to upload to my s3 bucket.

My script is like this. It works perfectly with a dummy dataframe, and it also works if I try to save my data locally with in-built open() function, but with my .csv it breaks at the last line:

# [..] other stuff
s3 = s3fs.S3FileSystem(session=session)
with s3.open('my_bucket/my_file.csv', 'w', encoding='utf-8') as output:
        data.to_csv(output, index=False, encoding='utf-8')

and here’s the traceback:

Traceback (most recent call last):
  File "s3_csv_gz.py", line 35, in <module>
    upload(file_path)
  File "s3_csv_gz.py", line 30, in upload
    data.to_csv(output, index=False, encoding='utf-8')
  File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 3020, in to_csv
    formatter.save()
  File "C:\Anaconda\lib\site-packages\pandas\io\formats\csvs.py", line 172, in save
    self._save()
  File "C:\Anaconda\lib\site-packages\pandas\io\formats\csvs.py", line 288, in _save
    self._save_chunk(start_i, end_i)
  File "C:\Anaconda\lib\site-packages\pandas\io\formats\csvs.py", line 315, in _save_chunk
    self.cols, self.writer)
  File "pandas/_libs/writers.pyx", line 72, in pandas._libs.writers.write_csv_rows
  File "C:\Anaconda\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 14-22: character maps to <undefined>

As you can see from the last call, for some reason the module called is cp1252.py, instead of utf-8.py. Now, I’m not 100% sure if that’s how it’s supposed to work, but I’m quite certain that cp1252 has nothing to do with utf-8.

Is there a way to circumvent this? I’d really love to use this package instead of boto3 to upload my files, but I can’t seem to make it work.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Sep 30, 2019

Ah, so there was indeed a bug, fixed in the linked PR. In way you are doing it now with ‘wb’, the file is open in binary mode, and I suppose pandas is doing the right thing in dealing with that.

0reactions
wtfzambocommented, Oct 1, 2019

I was surprised as well, apparently the compression only works when passing a file path as argument and not a file object, for some magical reason. A quick google search shows that it’s been already noticed by the community but never addressed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

s3.open() doesn't seem to understand user defined encoding
I have an utf-8 encoded .csv file with user defined data that I'm trying to upload to my s3 bucket. My script is...
Read more >
GetObject - Amazon Simple Storage Service
Retrieves objects from Amazon S3. To use GET , you must have READ access to the object. If you grant READ access to...
Read more >
How to import a text file on AWS S3 into pandas without ...
How to import a text file on AWS S3 into pandas without writing to disk · try it this way: io.BytesIO(file) or io.StringIO(file)...
Read more >
Resolve "AWS Access Key Id" errors when sending requests ...
I want to access my Amazon Simple Storage Service (Amazon S3) bucket using the AWS Command Line Interface (AWS CLI), an AWS SDK,...
Read more >
How can I fix the UTF-8 error when bulk uploading users?
The file should now be in UTF-8 encoding, and it will successfully upload. If you use Microsoft Excel. Open your CSV file in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found