Can't open S3 files with ?-like characters in name
See original GitHub issueobj = smart_open.open(''s3://bucket-name/folder/picture #1.jpg', 'rb')
results to the s3 key 'folder/picture ' does not exist, or is forbidden for access
.
At the same time
s3_session.resource('s3').Bucket('bucket-name').download_file("folder/picture #1.jpg", "/tmp/test.jpg")
downloads the file successfully.
Some investigation
Internally, smart_open
takes unquoted value (folder/picture #1.jpg
) and after parsing, it thinks that the path is older/picture
cutting the rest of the key.
I can try to feed it the quoted value (folder/picture%20%231.jpg
) - in that case it will parse it correctly and will think that the whole thing is the key, but after that AWS won’t understand this, as it’s client (boto3?) requires values ‘as is’.
Same problem with having “?” in the file name. I didn’t check other characters.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top Results From Across the Web
Special characters in Amazon S3 file name - Stack Overflow
I am uploading these files to S3. When I try to download these files I get an error like this InvalidArgument Header value...
Read more >Creating object key names - Amazon Simple Storage Service
Object keys (object key names) uniquely identify Amazon S3 objects. ... The object key (or key name) uniquely identifies the object in an...
Read more >Why can't I access a specific folder or file in my Amazon S3 ...
1. Open the Amazon S3 console. 2. From the list of buckets, open the bucket with the policy that you want to review....
Read more >What are valid S3 bucket names? - Flexera CMP Docs
Bucket names should be between 3 and 63 characters long; Bucket names cannot contain dashes next to periods (e.g., my-.bucket.com and my.-bucket are...
Read more >Working with Amazon S3 Keys: 3 Critical Aspects - Learn
You get to store an unlimited amount of data by uploading as many objects as you like in the bucket with each object...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hm, yes you’re right, just debugged this example again and # appeared to work ok. Maybe I mixed it up with
urlparse
in requests when I was trying to find a culprit and experimented with bothrequests
andsmart_open
. So the issue is with question mark only indeed. Thanks for addressing this.Could you please check: Temporarily replacing
?
with\n
insafe_urlsplit
is conflicting with the handling of_UNSAFE_URL_BYTES_TO_REMOVE
within newerurlsplit
versions:https://github.com/python/cpython/commit/8a595744e696a0fb92dccc5d4e45da41571270a1#diff-b3712475a413ec972134c0260c8f1eb1deefb66184f740ef00c37b4487ef873e