question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't open S3 files with ?-like characters in name

See original GitHub issue
obj = smart_open.open(''s3://bucket-name/folder/picture #1.jpg', 'rb')

results to the s3 key 'folder/picture ' does not exist, or is forbidden for access.

At the same time

s3_session.resource('s3').Bucket('bucket-name').download_file("folder/picture #1.jpg", "/tmp/test.jpg")

downloads the file successfully.

Some investigation

Internally, smart_open takes unquoted value (folder/picture #1.jpg) and after parsing, it thinks that the path is older/picture cutting the rest of the key.

I can try to feed it the quoted value (folder/picture%20%231.jpg) - in that case it will parse it correctly and will think that the whole thing is the key, but after that AWS won’t understand this, as it’s client (boto3?) requires values ‘as is’.

Same problem with having “?” in the file name. I didn’t check other characters.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
hexvoltcommented, Apr 15, 2019

Hm, yes you’re right, just debugged this example again and # appeared to work ok. Maybe I mixed it up with urlparse in requests when I was trying to find a culprit and experimented with both requests and smart_open. So the issue is with question mark only indeed. Thanks for addressing this.

0reactions
kn000xcommented, Aug 4, 2021

Could you please check: Temporarily replacing ? with \n in safe_urlsplit is conflicting with the handling of _UNSAFE_URL_BYTES_TO_REMOVE within newer urlsplit versions:

https://github.com/python/cpython/commit/8a595744e696a0fb92dccc5d4e45da41571270a1#diff-b3712475a413ec972134c0260c8f1eb1deefb66184f740ef00c37b4487ef873e

Read more comments on GitHub >

github_iconTop Results From Across the Web

Special characters in Amazon S3 file name - Stack Overflow
I am uploading these files to S3. When I try to download these files I get an error like this InvalidArgument Header value...
Read more >
Creating object key names - Amazon Simple Storage Service
Object keys (object key names) uniquely identify Amazon S3 objects. ... The object key (or key name) uniquely identifies the object in an...
Read more >
Why can't I access a specific folder or file in my Amazon S3 ...
1. Open the Amazon S3 console. 2. From the list of buckets, open the bucket with the policy that you want to review....
Read more >
What are valid S3 bucket names? - Flexera CMP Docs
Bucket names should be between 3 and 63 characters long; Bucket names cannot contain dashes next to periods (e.g., my-.bucket.com and my.-bucket are...
Read more >
Working with Amazon S3 Keys: 3 Critical Aspects - Learn
You get to store an unlimited amount of data by uploading as many objects as you like in the bucket with each object...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found