Specifying a GCS directory creates an empty file with no error
See original GitHub issueProblem description
Behavior with directory paths in GCS is surprising and not documented.
Steps/code to reproduce the problem
Suppose that this path references a directory in GCS:
gs://foo/bar
That url will fail with a 404. However, this URL will succeed:
gs://foo/bar/
It will result in an empty file instead of downloading the contents of the directory.
Since smart_open is designed to work on files I understand that downloading directories doesn’t make sense, but it seems like throwing an error would be more appropriate than creating an empty file.
I checked the docs but didn’t see any mention of downloading directories. For comparison, with a tool like gsutil there is a cp -r
that works as you would expect.
Versions
Linux-5.11.9-arch1-1-x86_64-with-arch
Python 3.7.9 (default, Mar 6 2021, 22:28:38)
[GCC 10.2.0]
smart_open 3.0.0
Checklist
Before you create the issue, please make sure you have:
- Described the problem clearly
- Provided a minimal reproducible example, including any required data
- Provided the version numbers of the relevant software
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Folders | Cloud Storage - Google Cloud
In the Google Cloud console, go to the Cloud Storage Buckets page. Go to Buckets · Navigate to the bucket. · Click on...
Read more >Unable to upload an empty file in google cloud when doing ...
Unable to upload an empty file in google cloud when doing resumable upload in ReactJS . it's showing CORS Error ... Could you...
Read more >Empty folders can't be created in GCS programmatically even ...
I want the capability to programmatically create an empty folder in GCS. How this might work: Using a GSUTIL command or REST request....
Read more >GCSFs Documentation - Read the Docs
A pythonic file-system interface to Google Cloud Storage. ... GCS does not include “directory” objects but instead generates directories by ...
Read more >Upload file data | Google Drive
The Drive API lets you upload file data when you create or update a File . ... These requests are uploading content no...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@mpenkov I actually think @petedannemann got this right - my understanding of object storage was lacking, and I guess this behavior is consistent. I don’t think the reading created a new object. When I said it created an empty file, I mean on local disk.
My private bucket in this example was created through the Google Cloud Storage UI. I created a “folder” (what the UI calls it) and put files in it. So I guess it exists as an empty object.
I am still confused about the behavior of my private bucket vs the public bucket. Should I assume the public bucket is using method 2 described in the Stack Overflow post, and therefore no empty object with the folder name exists, which is why it gives a 404? Is there some way to confirm that for buckets I don’t own?
Also, thanks for the
list_blobs
example, that’s very helpful. It might be a good idea to have that in the README.In any case it looks like this is not a bug in
smart_open
, and just a consequence of lack of understanding of object storage. Thanks for clarifying it for me!Thanks for the PR @polm. @mpenkov shall we close this issue then?