question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Specifying a GCS directory creates an empty file with no error

See original GitHub issue

Problem description

Behavior with directory paths in GCS is surprising and not documented.

Steps/code to reproduce the problem

Suppose that this path references a directory in GCS:

gs://foo/bar

That url will fail with a 404. However, this URL will succeed:

gs://foo/bar/

It will result in an empty file instead of downloading the contents of the directory.

Since smart_open is designed to work on files I understand that downloading directories doesn’t make sense, but it seems like throwing an error would be more appropriate than creating an empty file.

I checked the docs but didn’t see any mention of downloading directories. For comparison, with a tool like gsutil there is a cp -r that works as you would expect.

Versions

Linux-5.11.9-arch1-1-x86_64-with-arch
Python 3.7.9 (default, Mar  6 2021, 22:28:38) 
[GCC 10.2.0]
smart_open 3.0.0

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
polmcommented, May 1, 2021

@mpenkov I actually think @petedannemann got this right - my understanding of object storage was lacking, and I guess this behavior is consistent. I don’t think the reading created a new object. When I said it created an empty file, I mean on local disk.

My private bucket in this example was created through the Google Cloud Storage UI. I created a “folder” (what the UI calls it) and put files in it. So I guess it exists as an empty object.

I am still confused about the behavior of my private bucket vs the public bucket. Should I assume the public bucket is using method 2 described in the Stack Overflow post, and therefore no empty object with the folder name exists, which is why it gives a 404? Is there some way to confirm that for buckets I don’t own?

Also, thanks for the list_blobs example, that’s very helpful. It might be a good idea to have that in the README.

In any case it looks like this is not a bug in smart_open, and just a consequence of lack of understanding of object storage. Thanks for clarifying it for me!

1reaction
petedannemanncommented, May 7, 2021

Thanks for the PR @polm. @mpenkov shall we close this issue then?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Folders | Cloud Storage - Google Cloud
In the Google Cloud console, go to the Cloud Storage Buckets page. Go to Buckets · Navigate to the bucket. · Click on...
Read more >
Unable to upload an empty file in google cloud when doing ...
Unable to upload an empty file in google cloud when doing resumable upload in ReactJS . it's showing CORS Error ... Could you...
Read more >
Empty folders can't be created in GCS programmatically even ...
I want the capability to programmatically create an empty folder in GCS. How this might work: Using a GSUTIL command or REST request....
Read more >
GCSFs Documentation - Read the Docs
A pythonic file-system interface to Google Cloud Storage. ... GCS does not include “directory” objects but instead generates directories by ...
Read more >
Upload file data | Google Drive
The Drive API lets you upload file data when you create or update a File . ... These requests are uploading content no...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found